Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 288912

Summary: gentoo-sources 2.6.27-r8 and app-backup/amanda or gnu-tar : processes accessing st-device hang
Product: Gentoo Linux Reporter: Stefan G. Weichinger <lists>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED NEEDINFO    
Severity: critical CC: amg, dustin
Priority: High    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
URL: http://archives.zmanda.com/amanda-archives/viewtopic.php?t=4843&sid=63099960699f1fcfd068adfaf526b54c
Whiteboard:
Package list:
Runtime testing required: ---

Description Stefan G. Weichinger 2009-10-13 17:52:34 UTC
Kernel 2.6.27-gentoo-r8, amd64.
Amanda-2.6.1, 2.6.1p1 or 2.5.2p1, doesn't matter.

SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)

/dev/st0:
Vendor: HP       Model: Ultrium 2-SCSI   Rev: T61D
Type:   Sequential-Access                ANSI  SCSI revision: 05

When amanda OR tar tries to access the tape-device we get hanging processes.

Exactly the same system, without any changes, has worked for months now (since 2.6.27-r8 was the latest stable kernel ...)

We tried to check by using mt-commands, the do work, but tar or amcheck/amdump does not work. Seems to be more of a kernel or hardware issue ...

I am going to build a more recent kernel now and check with this.


Reproducible: Always

Steps to Reproduce:
1. tar -cf /dev/st0 foo.bar
2.
3.

Actual Results:  
INFO: task tar:8334 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tar           D 00000001000223ac     0  8334   8327
 ffff88011a4abd08 0000000000000046 0000000000000000 ffffffff803ec550
 ffff88011d882c30 ffffffff806ef340 ffff88011d882e60 0000000000000000
 00000000000000ff ffffffff804a1c7f 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff803ec550>] elv_next_request+0x183/0x193
 [<ffffffff804a1c7f>] scsi_request_fn+0x339/0x38f
 [<ffffffff805a8099>] schedule_timeout+0x1e/0xad
 [<ffffffff804a1104>] scsi_execute_async+0x328/0x375
 [<ffffffff80257bea>] __pagevec_free+0x21/0x2e
 [<ffffffff805a785c>] wait_for_common+0xc8/0x132
 [<ffffffff80227f05>] default_wake_function+0x0/0xe
 [<ffffffff804bc3f9>] st_do_scsi+0x235/0x265
 [<ffffffff804bbbfa>] st_sleep_done+0x0/0x5f
 [<ffffffff804bef63>] st_flush+0xf7/0x264
 [<ffffffff80277116>] filp_close+0x38/0x66
 [<ffffffff802312fd>] put_files_struct+0x66/0xc4
 [<ffffffff80232596>] do_exit+0x1e2/0x73b
 [<ffffffff80278f4f>] vfs_write+0x121/0x136
 [<ffffffff80232b55>] do_group_exit+0x66/0x96
 [<ffffffff80232b97>] sys_exit_group+0x12/0x16
 [<ffffffff8020b1eb>] system_call_fastpath+0x16/0x1b

mptscsih: ioc0: attempting task abort! (sc=ffff88011a41bac0)
st 6:0:0:0: CDB: Write Filemarks: 10 00 00 00 01 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88011a41bac0)
st0: Error 80000 (sugg. bt 0x0, driver bt 0x0, host bt 0x8).
st0: Error on write filemark.



Expected Results:  
tar/amanda should simply work.

# emerge --info
Portage 2.1.6.13 (default/linux/amd64/2008.0/server, gcc-4.1.2, glibc-2.8_p20080602-r1, 2.6.27-gentoo-r8 x86_64)
=================================================================
System uname: Linux-2.6.27-gentoo-r8-x86_64-Intel-R-_Core-TM-2_Duo_CPU_E7400_@_2.80GHz-with-glibc2.2.5
Timestamp of tree: Tue, 13 Oct 2009 17:30:01 +0000
app-shells/bash:     3.2_p39
dev-lang/python:     2.4.4-r13, 2.5.2-r7
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.63
sys-devel/automake:  1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LDFLAGS="-Wl,-O1"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl amd64 apache2 berkdb bzip2 cli cracklib crypt cups dri gdbm gpm iconv isdnlog mmx modules mudflap multilib mysql ncurses nls nptl nptlonly openmp pam pcre perl pppd python readline reflection session snmp spl sse sse2 ssl sysfs tcpd truetype unicode xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Stefan G. Weichinger 2009-10-13 18:36:20 UTC
Compiled and booted kernel 2.6.30-gentoo-r5, same result.
Hardware failure?

--> 

dmesg gives:

st0: Block limits 1 - 16777215 bytes.
INFO: task amcheck:8292 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
amcheck       D ffff88002802f000     0  8292   8291
 ffff88011e39a8b0 0000000000000086 ffff88011eb27000 ffffffff8046cd03
 ffffffff806fa360 ffff88011e39ab20 000000001eab9000 00000000ffffb360
 ffff88011e0ed2e0 ffff88011eab9000 0000000000000000 ffffffff804d8a29
Call Trace:
 [<ffffffff8046cd03>] ? scsi_host_alloc_command+0x12/0x56
 [<ffffffff804d8a29>] ? mptscsih_qcmd+0x62c/0x6a4
 [<ffffffff805a319d>] ? schedule+0x9/0x1d
 [<ffffffff805a32a0>] ? schedule_timeout+0x23/0x158
 [<ffffffff803d5856>] ? elv_next_request+0x154/0x164
 [<ffffffff805a2997>] ? wait_for_common+0xb3/0x11f
 [<ffffffff8022899b>] ? default_wake_function+0x0/0x9
 [<ffffffff8048f3d9>] ? st_do_scsi+0x2bc/0x2ec
 [<ffffffff8048ff70>] ? st_int_ioctl+0x639/0x9b5
 [<ffffffff8049360f>] ? st_ioctl+0xacd/0xe61
 [<ffffffff80276eff>] ? alloc_page_vma+0xfb/0x14d
 [<ffffffff8026fd75>] ? page_add_new_anon_rmap+0x28/0x48
 [<ffffffff8028a32e>] ? vfs_ioctl+0x21/0x6b
 [<ffffffff8028a795>] ? do_vfs_ioctl+0x41d/0x477
 [<ffffffff8021ff60>] ? do_page_fault+0x1aa/0x1fe
 [<ffffffff8028a82b>] ? sys_ioctl+0x3c/0x5c
 [<ffffffff8020ad2b>] ? system_call_fastpath+0x16/0x1b
Comment 2 Sebastian Luther (few) 2009-10-13 18:46:29 UTC
(In reply to comment #1)
> Hardware failure?

Try again with a kernel that is known to work? 

Which kernel versions do work, which don't? Have you searched for upstream bug reports?
Comment 3 Stefan G. Weichinger 2009-10-13 18:55:12 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Hardware failure?
> 
> Try again with a kernel that is known to work? 

2.6.27 used to work reliable for months, as mentioned.
 
> Which kernel versions do work, which don't? Have you searched for upstream bug
> reports?

Only tested the mentioned 2 kernels, upstream bug reports didn't give a clear route to go yet. Hanging tasks ... yes ... but what to search for exactly ...
Comment 4 Stefan G. Weichinger 2009-10-16 06:26:34 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Hardware failure?

The admin there noticed that the time/date was off for one hour ... set it via ntpdate and from then everything worked without a problem. I can't explain ... anyone?
Comment 5 Syed Amer Gilani 2009-10-21 10:32:01 UTC
I have something similar on the same SAS Controller. 
since some time our backup server completely hangs when starting a amanda backup to tape. backing up to hard disk is no problem.
this first started after upgrading to a 2.6.30 gentoo-sources kernel and to amanda 2.6.0_p2-r4. After that amanda killed the server completely every time. No response over Network or Keyboard. On Screen it shows a call trace with something about scsi. I cant post the exact output because even after rebooting with SysRq + REISUB there is log of it. Maybe because the Harddisks are on the same controller as the tape device.

Downgrading back to 2.6.27-gentoo-r7 helped. It now only happens sometimes, not every time. 

Hardware:
LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS
Quantum Ultrium LTO-3
Comment 6 Mike Pagano gentoo-dev 2009-11-22 13:59:26 UTC
Has anyone tried the ntpdate workaround and NOT had success?

Comment 7 Syed Amer Gilani 2009-11-23 15:21:30 UTC
ntpdate was already executed every hour here. but i reread everything and i think i might have a different problem. here the whole kernel crashes and not only a process hangs. the only common thing is that we have the same SAS Controller.

And since i downgraded to 2.6.27-gentoo-r7 and recompiled the whole system with -O1 it did not crash in weeks.
I will try a newer Kernel again, but i have no time for that at the moment.