Summary: | gentoo-sources 2.6.27-r8 and app-backup/amanda or gnu-tar : processes accessing st-device hang | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Stefan G. Weichinger <lists> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED NEEDINFO | ||
Severity: | critical | CC: | amg, dustin |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
URL: | http://archives.zmanda.com/amanda-archives/viewtopic.php?t=4843&sid=63099960699f1fcfd068adfaf526b54c | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Stefan G. Weichinger
2009-10-13 17:52:34 UTC
Compiled and booted kernel 2.6.30-gentoo-r5, same result. Hardware failure? --> dmesg gives: st0: Block limits 1 - 16777215 bytes. INFO: task amcheck:8292 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. amcheck D ffff88002802f000 0 8292 8291 ffff88011e39a8b0 0000000000000086 ffff88011eb27000 ffffffff8046cd03 ffffffff806fa360 ffff88011e39ab20 000000001eab9000 00000000ffffb360 ffff88011e0ed2e0 ffff88011eab9000 0000000000000000 ffffffff804d8a29 Call Trace: [<ffffffff8046cd03>] ? scsi_host_alloc_command+0x12/0x56 [<ffffffff804d8a29>] ? mptscsih_qcmd+0x62c/0x6a4 [<ffffffff805a319d>] ? schedule+0x9/0x1d [<ffffffff805a32a0>] ? schedule_timeout+0x23/0x158 [<ffffffff803d5856>] ? elv_next_request+0x154/0x164 [<ffffffff805a2997>] ? wait_for_common+0xb3/0x11f [<ffffffff8022899b>] ? default_wake_function+0x0/0x9 [<ffffffff8048f3d9>] ? st_do_scsi+0x2bc/0x2ec [<ffffffff8048ff70>] ? st_int_ioctl+0x639/0x9b5 [<ffffffff8049360f>] ? st_ioctl+0xacd/0xe61 [<ffffffff80276eff>] ? alloc_page_vma+0xfb/0x14d [<ffffffff8026fd75>] ? page_add_new_anon_rmap+0x28/0x48 [<ffffffff8028a32e>] ? vfs_ioctl+0x21/0x6b [<ffffffff8028a795>] ? do_vfs_ioctl+0x41d/0x477 [<ffffffff8021ff60>] ? do_page_fault+0x1aa/0x1fe [<ffffffff8028a82b>] ? sys_ioctl+0x3c/0x5c [<ffffffff8020ad2b>] ? system_call_fastpath+0x16/0x1b (In reply to comment #1) > Hardware failure? Try again with a kernel that is known to work? Which kernel versions do work, which don't? Have you searched for upstream bug reports? (In reply to comment #2) > (In reply to comment #1) > > Hardware failure? > > Try again with a kernel that is known to work? 2.6.27 used to work reliable for months, as mentioned. > Which kernel versions do work, which don't? Have you searched for upstream bug > reports? Only tested the mentioned 2 kernels, upstream bug reports didn't give a clear route to go yet. Hanging tasks ... yes ... but what to search for exactly ... (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > Hardware failure? The admin there noticed that the time/date was off for one hour ... set it via ntpdate and from then everything worked without a problem. I can't explain ... anyone? I have something similar on the same SAS Controller. since some time our backup server completely hangs when starting a amanda backup to tape. backing up to hard disk is no problem. this first started after upgrading to a 2.6.30 gentoo-sources kernel and to amanda 2.6.0_p2-r4. After that amanda killed the server completely every time. No response over Network or Keyboard. On Screen it shows a call trace with something about scsi. I cant post the exact output because even after rebooting with SysRq + REISUB there is log of it. Maybe because the Harddisks are on the same controller as the tape device. Downgrading back to 2.6.27-gentoo-r7 helped. It now only happens sometimes, not every time. Hardware: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS Quantum Ultrium LTO-3 Has anyone tried the ntpdate workaround and NOT had success? ntpdate was already executed every hour here. but i reread everything and i think i might have a different problem. here the whole kernel crashes and not only a process hangs. the only common thing is that we have the same SAS Controller. And since i downgraded to 2.6.27-gentoo-r7 and recompiled the whole system with -O1 it did not crash in weeks. I will try a newer Kernel again, but i have no time for that at the moment. |