Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 257739 - kernel 2.6.27-gentoo-r1 - kjournald fails "unknown scsi failure" under VMware
Summary: kernel 2.6.27-gentoo-r1 - kjournald fails "unknown scsi failure" under VMware
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High major (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-05 07:32 UTC by Stefan de Konink
Modified: 2009-03-15 00:41 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Kernel config (kernel2627,34.59 KB, text/plain)
2009-02-06 09:23 UTC, Stefan de Konink
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan de Konink 2009-02-05 07:32:43 UTC
This machine does not use any vmware kernel modules. Thus is Gentoo only; I see frankly no reason to post anything related to emerge --info. Or remotely try to reproduce this bug. And I state clearly that this is as *reference only* for someone that wants to fix the Linux Kernel. I cannot post this bug on bugs.kernel.org because of the -gentoo-r1.

Reproducible: Didn't try

Steps to Reproduce:
1. unknown scsi failure
2. devices goes into ro mode

Actual Results:  

mptscsih: ioc0: attempting task abort! (sc=ffff8800599df280)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 b4 f3 30 00 00 10 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8800599df280)
mptscsih: ioc0: attempting task abort! (sc=ffff880060537140)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 ad ae 00 00 00 08 00
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: Issue of TaskMgmt failed!
mptscsih: ioc0: task abort: FAILED (sc=ffff880060537140)
mptscsih: ioc0: attempting target reset! (sc=ffff880060537140)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 ad ae 00 00 00 08 00
mptscsih: ioc0: target reset: SUCCESS (sc=ffff880060537140)
scsi target0:0:0: Beginning Domain Validation
scsi target0:0:0: Domain Validation skipping write tests
scsi target0:0:0: Ending Domain Validation
scsi target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
mptscsih: ioc0: attempting task abort! (sc=ffff88006203b000)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 3c 8c 80 00 00 10 00
mptscsih: ioc0: Issue of TaskMgmt failed!
mptscsih: ioc0: task abort: FAILED (sc=ffff88006203b000)
mptscsih: ioc0: attempting task abort! (sc=ffff88006203b3c0)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 88 8c 80 00 00 18 00
mptscsih: ioc0: task abort: FAILED (sc=ffff88006203b3c0)
mptscsih: ioc0: attempting target reset! (sc=ffff88006203b000)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 3c 8c 80 00 00 10 00
mptscsih: ioc0: target reset: FAILED (sc=ffff88006203b000)
mptscsih: ioc0: attempting bus reset! (sc=ffff88006203b000)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 3c 8c 80 00 00 10 00
mptscsih: ioc0: bus reset: FAILED (sc=ffff88006203b000)
mptscsih: ioc0: attempting host reset! (sc=ffff88006203b000)
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: host reset: SUCCESS (sc=ffff88006203b000)
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x06
end_request: I/O error, dev sda, sector 3968128
Buffer I/O error on device sda3, logical block 0
lost page write due to I/O error on sda3
Buffer I/O error on device sda3, logical block 1
lost page write due to I/O error on sda3
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
Buffer I/O error on device sda3, logical block 643072
lost page write due to I/O error on sda3
sd 0:0:0:0: rejecting I/O to offline device
Buffer I/O error on device sda4, logical block 2031618
lost page write due to I/O error on sda4
sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x06
end_request: I/O error, dev sda, sector 8948864
Buffer I/O error on device sda3, logical block 622592
lost page write due to I/O error on sda3
Buffer I/O error on device sda3, logical block 622593
lost page write due to I/O error on sda3
Buffer I/O error on device sda3, logical block 622594
lost page write due to I/O error on sda3
sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
end_request: I/O error, dev sda, sector 8952432
Buffer I/O error on device sda3, logical block 623038
lost page write due to I/O error on sda3
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
Buffer I/O error on device sda4, logical block 32547
lost page write due to I/O error on sda4
Aborting journal on device sda4.
sd 0:0:0:0: rejecting I/O to offline device
Buffer I/O error on device sda4, logical block 1441
lost page write due to I/O error on sda4
------------[ cut here ]------------
WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x78/0x90()
Modules linked in: usbcore
Pid: 855, comm: kjournald Not tainted 2.6.27-gentoo-r1 #2

Call Trace:
 [<ffffffff8102d954>] warn_on_slowpath+0x64/0xb0
 [<ffffffff810bc345>] submit_bh+0xf5/0x120
 [<ffffffff810bdb16>] sync_dirty_buffer+0x46/0xf0
 [<ffffffff810bce58>] mark_buffer_dirty+0x78/0x90
 [<ffffffff810fd7b9>] __journal_unfile_buffer+0x9/0x20
 [<ffffffff810ff871>] journal_commit_transaction+0x631/0xd20
 [<ffffffff810428a0>] autoremove_wake_function+0x0/0x30
 [<ffffffff81337aec>] thread_return+0x30/0x174
 [<ffffffff81102a1f>] kjournald+0xbf/0x1e0
 [<ffffffff810428a0>] autoremove_wake_function+0x0/0x30
 [<ffffffff81102960>] kjournald+0x0/0x1e0
 [<ffffffff81042247>] kthread+0x47/0x90
 [<ffffffff8102b2a8>] schedule_tail+0x18/0x60
 [<ffffffff8100d159>] child_rip+0xa/0x11
 [<ffffffff81042200>] kthread+0x0/0x90
 [<ffffffff8100d14f>] child_rip+0x0/0x11

---[ end trace 36584985e76b769a ]---
scsi target0:0:0: Beginning Domain Validation
scsi target0:0:0: Domain Validation skipp
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2009-02-05 16:23:43 UTC
Please post your `emerge --info' output too. Why do you seem to forget that almost every time?
Comment 2 Stefan de Konink 2009-02-05 16:36:13 UTC
Because it *does not matter*. There is not something like magically broken compilers or magically changing API's if it was everyone was complaining. Clearly this is a kernel bug, emerge --info has nothing to do with kernel bugs.

The main point why the emerge --info fetish of Gentoo is stupid is that everyone can change the outcome of it; and it does not represent the values over a period of time. For example you update your glibc, how will you know that the rest of the system isn't under the influence from an old and a new glibc? It is not mentioned that 90% of the packages were compiled against the old version, the old version was not even mentioned.

Read the error, and see what it reports. It is a *kernel* bug, it even borks on a specific controller. But to satisfy your needs for environment variables, that effectively allow others to find this bug back and do something useful with it:

Portage 2.1.6 (default/linux/amd64/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r0, 2.6.27-gentoo-r1 x86_64)
=================================================================
System uname: Linux-2.6.27-gentoo-r1-x86_64-Intel-R-_Xeon-R-_CPU_L5320_@_1.86GHz-with-glibc2.2.5
Timestamp of tree: Mon, 05 Jan 2009 22:45:02 +0000
app-shells/bash:     3.2_p48
dev-lang/python:     2.4.4-r13, 2.5.2-r8
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 2.0.0
sys-apps/openrc:     0.3.0-r1
sys-apps/sandbox:    1.2.18.1-r3
sys-devel/autoconf:  2.63
sys-devel/automake:  1.10.2
sys-devel/binutils:  2.19
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   2.2.6a
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS="amd64 ~amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://ftp.snt.utwente.nl/pub/os/linux/gentoo/distfiles ftp://ftp.snt.utwente.nl/pub/os/linux/gentoo  http://www.ibiblio.org/gentoo  http://distfiles.gentoo.org/distfiles http://distro.ibiblio.org/pub/linux/distributions/gentoo/distfiles"
LDFLAGS="-Wl,-O1"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_EXTRA_OPTS="--exclude-from=/etc/portage/rsync_excludes"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://192.168.0.1/gentoo-portage"
USE="acl amd64 berkdb big-tables bzip2 caps cgi cli cluster cracklib crypt cups curl dri extraengine fortran gd gdbm geoip gif iconv ipv6 isdnlog jpeg midi mmx multilib mysql ncurses nls nptl nptlonly openmp pam pcre pdo perl png pppd python readline reflection session simplexml spl sse sse2 ssl sysfs tcpd threads truetype unicode xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fbdev glint i810 intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTDIR_OVERLAY
Comment 3 Jeroen Roovers (RETIRED) gentoo-dev 2009-02-06 02:41:20 UTC
Stefan, if you were actually better at reporting bugs, then I wouldn't be asking for general information.

What kind of filesystem is kjournald faulting over? /proc/mounts would be useful in this case, and a more complete dmesg output, and maybe the kernel config.
Comment 4 Stefan de Konink 2009-02-06 09:23:53 UTC
Created attachment 181125 [details]
Kernel config

(In reply to comment #3)
> Stefan, if you were actually better at reporting bugs, then I wouldn't be
> asking for general information.

"Je moet een gegeven paard niet in de bek kijken."

> What kind of filesystem is kjournald faulting over? /proc/mounts would be
> useful in this case, and a more complete dmesg output, and maybe the kernel
> config.

ext3. Do you think that I would let a production server in a 'debugging' position? To me this is an incident, enough to make it come up with Googling for others.
Comment 5 Mike Pagano gentoo-dev 2009-02-17 14:49:17 UTC
Is this reproducable?

Nick Piggin reports working on a patch to clean up and improve the page and buffer error handling in the vm/fs in response to a posting on lkml concerning this error.

http://lkml.org/lkml/2008/10/28/414

Haven't yet been able to locate the specific patch(s), but if it's reproducible, could you test on later kernels?


Comment 6 Stefan de Konink 2009-02-17 18:17:53 UTC
(In reply to comment #5)
> Is this reproducable?

Not tried.
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2009-03-15 00:15:25 UTC
OK, thanks for the report. Always good to know the problems that are encountered. Please reopen if you'd be willing to attempt to reproduce and test other kernels etc.
Comment 8 Stefan de Konink 2009-03-15 00:41:30 UTC
(In reply to comment #7)
> OK, thanks for the report. Always good to know the problems that are
> encountered. Please reopen if you'd be willing to attempt to reproduce and test
> other kernels etc.

The task aborts are now limited to:
mptscsih: ioc0: attempting task abort! (sc=ffff88007e184500)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 b5 b7 b0 00 00 10 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007e184500)
mptscsih: ioc0: attempting task abort! (sc=ffff88007e184500)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 b5 b7 c8 00 00 10 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007e184500)
mptscsih: ioc0: attempting task abort! (sc=ffff88007e197280)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 9b ee 10 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007e197280)
mptscsih: ioc0: attempting task abort! (sc=ffff88004c94a640)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 9b ee 10 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88004c94a640)
mptscsih: ioc0: attempting task abort! (sc=ffff88007e9fdb40)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 b5 6d 98 00 00 10 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007e9fdb40)
mptscsih: ioc0: attempting task abort! (sc=ffff880026842a00)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 9b ee 10 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff880026842a00)
mptscsih: ioc0: attempting task abort! (sc=ffff88007edb9000)
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 9b ee 10 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007edb9000)