Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 224857 - 2.6.24-gentoo-r8: HFS+ issue: kernel BUG at fs/hfsplus/bnode.c:623!
Summary: 2.6.24-gentoo-r8: HFS+ issue: kernel BUG at fs/hfsplus/bnode.c:623!
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-04 09:08 UTC by Walther
Modified: 2008-06-10 07:24 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Walther 2008-06-04 09:08:02 UTC
This is a MacPro5 machine (x86_64) in which the /home partition is formatted as HFS+ so it is shared between the Mac and Linux OSes.

During part of the work, there is a large amount of files being created and deleted, but sometimes when deleting a large directory (~7000 files) the filesystem driver "crashes", and all further access to the partition just hangs the interface (the kernel works fine, but any thread attempting to access the filesystem will just hang forever).
Since / is ext3, the message logs shows a stackdump for this:

Jun  3 20:12:36 [kernel] kernel BUG at fs/hfsplus/bnode.c:623!
Jun  3 20:12:36 [kernel] CPU 5
Jun  3 20:12:36 [kernel] Modules linked in: af_packet rng_core
Jun  3 20:12:36 [kernel] Pid: 6978, comm: rm Not tainted 2.6.24-gentoo-r8 #1
Jun  3 20:12:36 [kernel] RIP: 0010:[<ffffffff802c26a4>]  [<ffffffff802c26a4>] hfsplus_bnode_put+0x15/0x82
Jun  3 20:12:36 [kernel] RSP: 0018:ffff81028363fc88  EFLAGS: 00010246
Jun  3 20:12:36 [kernel] RAX: 0000000000000000 RBX: ffff8102d44f8360 RCX: 0000000000000001
Jun  3 20:12:36 [kernel] RDX: 01e4ffff802c2ab7 RSI: 0000000000000001 RDI: ffff8102d44f8360
Jun  3 20:12:36 [kernel] RBP: ffff8102fe835000 R08: 0000000000000004 R09: 78028102fe940290
Jun  3 20:12:36 [kernel] R10: ffff81028363fbce R11: ffff8102fe940298 R12: 0000000000001fe4
Jun  3 20:12:36 [kernel] R13: ffff81028363fdb8 R14: 00000000000001d0 R15: 0000000000001fe4
Jun  3 20:12:36 [kernel] FS:  00002ad26b4eb6f0(0000) GS:ffff8102ffc0da00(0000) knlGS:0000000000000000
Jun  3 20:12:36 [kernel] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun  3 20:12:36 [kernel] CR2: 00002ad26b222080 CR3: 00000002d45e0000 CR4: 00000000000006a0
Jun  3 20:12:36 [kernel] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun  3 20:12:36 [kernel] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun  3 20:12:36 [kernel] Process rm (pid: 6978, threadinfo ffff81028363e000, task ffff8102ff42ce40)
Jun  3 20:12:36 [kernel] Stack:  000000000000006c ffff8102fe940240 0000000000001fe4 ffffffff802c41bd
Jun  3 20:12:36 [kernel]  0000000000000000 0010000000000700 ffff8102fe835000 ffff810283ce2000
Jun  3 20:12:36 [kernel]  ffff8102d44f8360 0000003a802c2d74 0000000900000010 ffff0000060001ff
Jun  3 20:12:36 [kernel] Call Trace:
Jun  3 20:12:36 [kernel]  [<ffffffff802c41bd>] hfs_brec_update_parent+0x278/0x2c4
Jun  3 20:12:36 [kernel]  [<ffffffff802c432f>] hfsplus_brec_remove+0x126/0x134
Jun  3 20:12:36 [kernel]  [<ffffffff802c0e4e>] hfsplus_delete_cat+0x181/0x20b
Jun  3 20:12:36 [kernel]  [<ffffffff802c12f0>] hfsplus_unlink+0xb1/0x15e
Jun  3 20:12:36 [kernel]  [<ffffffff8026e9c4>] permission+0xd6/0xec
Jun  3 20:12:36 [kernel]  [<ffffffff8026f618>] vfs_unlink+0x55/0x9c
Jun  3 20:12:36 [kernel]  [<ffffffff802714ef>] do_unlinkat+0xaa/0x14b
Jun  3 20:12:36 [kernel]  [<ffffffff80273598>] sys_getdents+0xaf/0xbf
Jun  3 20:12:36 [kernel]  [<ffffffff80469509>] error_exit+0x0/0x51
Jun  3 20:12:36 [kernel]  [<ffffffff8020b54e>] system_call+0x7e/0x83
Jun  3 20:12:36 [kernel] Code: 0f 0b eb fe 48 8d 75 6c 48 8d 7f 48 45 31 e4 e8 d4 b1 01 00
Jun  3 20:12:36 [kernel] RIP  [<ffffffff802c26a4>] hfsplus_bnode_put+0x15/0x82
Jun  3 20:12:36 [kernel]  RSP <ffff81028363fc88>
Jun  3 20:12:36 [kernel] ---[ end trace 0fd72d8fe3b12ca5 ]---

I am aware this belongs upstream, but I couldn't find any guidelines on how to properly report a bug on the LKML, or if the lkml is the mailing list I should report this particular bug to.

Note that the bug does not happens every time, but we are able to bump into it while working around once a day.

Reproducible: Always

Steps to Reproduce:
1. Generate a large amount of files and directories.
2. Delete some of these directories.

Actual Results:  
Indefinite hang + kernel bug backtrace.

Expected Results:  
Normal filesystem operation.

# df -h
Sys. de fich.         Tail. Occ. Disp. %Occ. Mont sur
/dev/sda3              46G  6,1G   38G  14% /
udev                   10M   68K   10M   1% /dev
/dev/sda4             373G  1,4G  372G   1% /home
shm                   4,9G     0  4,9G   0% /dev/shm

# emerge --info
Portage 2.1.4.4 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r8 x86_64)
=================================================================
System uname: 2.6.24-gentoo-r8 x86_64 Intel(R) Xeon(R) CPU X5472 @ 3.00GHz
Timestamp of tree: Tue, 20 May 2008 04:45:03 +0000
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch userpriv"
GENTOO_MIRRORS="ftp://mirror.ovh.net/gentoo-distfiles/ ftp://gentoo.imj.fr/pub/gentoo/ ftp://ftp.free.fr/mirrors/ftp.gentoo.org/ "
LANG="fr_FR.UTF-8"
LC_ALL="fr_FR@euro"
LINGUAS="fr en_GB en"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/etc/portage/overlay"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X aac acl alsa amd64 avahi berkdb bzip2 cairo cli cracklib crypt cups dbus dri dts dvd dvi emacs encode fortran ftp gdbm gif gpm gs gtk hal hddtemp iconv ipv6 isdnlog java jpeg jpeg2k libnotify lm_sensors lua midi mmx mp3 mudflap ncurses nls nptl nptlonly ocaml opengl openmp pam pcre pdf perl php png pppd python rar readline reflection rtc ruby session socks5 spl sse sse2 ssl startup-notification svg tcpd tetex theora threads tiff truetype unicode vim vim-syntax vorbis xml xorg xscreensaver xvid zip zlib"
ALSA_CARDS="hda-intel"
ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="fr en_GB en" USERLAND="GNU" VIDEO_CARDS="vesa radeon r128"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Duane Griffin 2008-06-04 13:57:41 UTC
Looking through the log there is one post-2.6.24 commit that sounds like it could be relevant: 76b0c26af2736b7e5b87e6ed7ab63901483d5736 (HFS+: fix unlink of links). Would it be possible for you to test the latest vanilla kernel (2.6.25.4 as of writing) to see if that fixes it?

If the problem persists then it would be best to open a ticket at bugzilla.kernel.org and/or report it at lkml. We encourage people to open kernel bugzilla tickets to make it easier for us to track, but reporting it to lkml as well would probably increase the chances of it being dealt with promptly.

On the kernel bugzilla please put the information you provided here, minus the emerge --info, and adding a link back to this bug. Once you have created the ticket please add a comment to this bug with a link to it. It would be best if you could reproduce on the latest vanilla kernel first and give a stackdump from that.

To report it on lkml just send an email to the list with the same information as above, CC'ing Roman Zippel, the HFS+ maintainer (you can find his email in the MAINTAINERS file). Include a link to the kernel bugzilla ticket in the email. Email to lkml should be plain text with no attachments. Just include the stackdump and any other information inline.
Comment 2 Walther 2008-06-09 07:44:07 UTC
Been running with this kernel for almost a week now, and there hasn't been any hangs up yet.

> uname -a
Linux bossa 2.6.25.4 #2 SMP Thu Jun 5 11:48:19 CEST 2008 x86_64 Intel(R) Xeon(R) CPU X5472 @ 3.00GHz GenuineIntel GNU/Linux

I think it is safe to assume then, that the problem is caused by the Gentoo patches, perhaps the related patch mentioned in the previous comment.

What next? It is apparent this can't be reported to the kernel bugtracker as this is not a bug in the vanilla kernel.

PS: Thanks for the information on how to submit kernel bugs, I will save this reference for the future.
Comment 3 Duane Griffin 2008-06-09 12:12:10 UTC
Fantastic! Since it seems 2.6.25 works I'll mark this as fixed. If it recurs then please re-open the bug.

I think you misunderstood slightly where the bug lies. It sounds like the bug is in vanilla 2.6.24 and not in vanilla 2.6.25; nothing to do with Gentoo one way or the other. However, I was wrong about the specific patch. Now that I look closer I see that it has already been included in Gentoo's 2.6.24 patches. It must be something else in 2.6.25 that fixes your problem.

If we could identify the specific patch(es) that fixed it we could probably include them in Gentoo's 2.6.24 patch set. Alternatively you could just carry on using vanilla or ~amd64 gento-sources-2.6.25-* until we stabilise 2.6.25.

If you wanted to locate the fix then I think you'll have to start bisecting with git. This can be a long process, especially if the bug takes some time to reproduce. Unfortunately I can't see any other specific patches that sound promising in the ChangeLog for the HFS+ code, although it might be an idea to start off by bisecting within those commits, just in case.

You can read more about using git to bisect here:
http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/

And in section 4.5 here:
http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf

Note that in this case you'd be doing things in reverse -- instead of finding the patch that introduced the bug you'd be looking for the one that fixed it. Let me know if you'd like more info or assistance with this.
Comment 4 Walther 2008-06-10 07:24:16 UTC
Oh, very well, thanks for the clarification. I'll stick to vanilla 2.6.25.4 until the Gentoo-2.6.25 kernel stabilizes (is not like I don't have enough work already, and reproducing this bug costs us downtime, and it takes over a day sometimes to get it to trigger, the machine is not exposed to the net, so is not like it is in a high-risk environment).