Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 179563 - reiserfs kernel panic, kernel 2.6.20-gentoo-r7
Summary: reiserfs kernel panic, kernel 2.6.20-gentoo-r7
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: Other Linux
: High major (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-23 18:31 UTC by Lindsay Haisley
Modified: 2007-05-25 02:32 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Log extract for reiserfs kerenl panic. (reiserfs_panic.txt,7.64 KB, text/plain)
2007-05-23 18:33 UTC, Lindsay Haisley
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lindsay Haisley 2007-05-23 18:31:45 UTC
About a week ago one of two servers that are the foundation of my business crashed on a kernel panic with substantial reiserfs filesystem corruption.  This system had run flawlessly without a reboot for about 9 months using kernel 2.6.17-gentoo-r7.  This crash occurred a couple of weeks after upgrading to kernel 2.6.20-gentoo-r7.

The box is a dual-Xeon system with a 3ware RAID controller 8506-4LP running in RAID-5 mode.  The controller's log shows no errors.  The kernel was compiled with gcc x86_64-pc-linux-gnu-4.1.1 (Gentoo 4.1.1-r3).

I find a couple of other references to a similar error at http://lkml.org/lkml/2006/10/17/360, http://lkml.org/lkml/2006/11/3/82 and http://crypto.riken.go.jp/pub/linux/kernel.org/kernel/v2.6/snapshots/patch-2.6.19-git13.log
including several references to a "reiserfs bad path release panic".

My understanding is that development of reiserfs kernel support and tools will be frozen at current levels going forward, and that the ext3 journaling filesystem will be recommended.  Is this reiserfs bug one that's known to Gentoo kernel devs which may be addressed in the future?  Would I be well advised to transition the affected systems to ext3 filesystems?  I've backed off the kernel upgrade of 3 weeks ago.  Stability is of paramount importance on these machines.

I'm posting the log of the crash separately.
Comment 1 Lindsay Haisley 2007-05-23 18:33:42 UTC
Created attachment 120114 [details]
Log extract for reiserfs kerenl panic.
Comment 2 Nick Loeve 2007-05-23 20:30:21 UTC
Are you able to try and reproduce this error with gentoo-sources-2.6.21-r1 which is on the testing branch currently? 
Comment 3 Lindsay Haisley 2007-05-24 02:21:38 UTC
This is a production machine hosting web and email resources for paying customers.  The crash happened after a couple of weeks of flawless operation, apparently during a pop3 fetch from a customer with the system under moderate load.  The resulting filesystem corruption was severe enough to require me to use --rebuild-tree with fsck.reiserfs and a small number of customer emails were lost - fortunately, and apparently, only some spam.  On top of that the box is co-located about 30 miles away and recovery involves rousting my (luckily very good-natured) hosting provider out of bed or away from his family if it happens at night to come and give me access to the box.

In short, I'm really not comfortable trying to reproduce the crash under any circumstances, although if you tell me that the crash log appears to be the result of a known bug that's been explicitly addressed between 2.6.20-gentoo-r7 and 2.6.21-gentoo-r1 then I'll be happy to build and install this kernel and see what happens.
Comment 4 Nick Loeve 2007-05-24 20:22:42 UTC
Hi,

There was a patch in 2.6.21 with the following commit message:

    [PATCH] reiserfs: fix key decrementing
    
    This patch fixes a bug in function decrementing a key of stat data item.
    
    Offset of reiserfs keys are compared as signed values.  To set key offset
    to maximal possible value maximal signed value has to be used.
    
    This bug is responsible for severe reiserfs filesystem corruption which
    shows itself as warning vs-13060.  reiserfsck fixes this corruption by
    filesystem tree rebuilding.


This could well be the fix for the panic you experienced, based on the top most errors in the log you have attached.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6d205f120547043de663315698dcf5f0eaa31b5c
Comment 5 Lindsay Haisley 2007-05-24 21:53:33 UTC
OK, this looks like it probably fixes, or at least addresses the problem.  I'll build this kernel and install it, and let you know how things go.
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2007-05-24 23:28:11 UTC
(In reply to comment #0)
> I find a couple of other references to a similar error at
> http://lkml.org/lkml/2006/10/17/360, http://lkml.org/lkml/2006/11/3/82 and
> http://crypto.riken.go.jp/pub/linux/kernel.org/kernel/v2.6/snapshots/patch-2.6.19-git13.log
> including several references to a "reiserfs bad path release panic".

They appear to be unrelated to your crash.

> My understanding is that development of reiserfs kernel support and tools will
> be frozen at current levels going forward, and that the ext3 journaling
> filesystem will be recommended.  Is this reiserfs bug one that's known to
> Gentoo kernel devs which may be addressed in the future?  Would I be well
> advised to transition the affected systems to ext3 filesystems?  I've backed
> off the kernel upgrade of 3 weeks ago.  Stability is of paramount importance on
> these machines.

reiserfs was dead from pretty much the point of kernel inclusion - the developers argued heavily for inclusion in the kernel before it had been heavily tested and fully reviewed, and then after inclusion when people did point out bugs and other issues, nobody did anything. So yes, things are relatively frozen and always have been.

I've never used reiserfs for any extended period of time and I don't know anything about it's internals, so I can't accurately comment on its reliability. However I have seen a number of issues like this reported to the Gentoo bugzilla  over the last 2 years, and as you've seen, even bugs that get reported upstream seem to be mostly ignored. In comparison, we rarely see issues like this reported for ext2/ext3, I can only think of one infact which was a bug when the partition was larger than 3TB or something like that. So at least from a watching-the-Gentoo-bugzilla standpoint, I'd certainly recommend ext3.

Nick: nice find, that seems very likely to be a fix for this issue. I've added this patch to the 2.6.20 tree, but I'm not sure if there will be further 2.6.20 releases. 2.6.21 is already fixed as you have noted.

Given that this bug would be hard to reproduce even on a buggy kernel, and that the fix is already available in 2.6.21, I'm going to close this bug. Feel free to reopen it if any further information becomes available.
Comment 7 Lindsay Haisley 2007-05-25 00:38:19 UTC
(In reply to comment #6)
> (In reply to comment #0)
> I've never used reiserfs for any extended period of time and I don't know
> anything about it's internals, so I can't accurately comment on its
> reliability. However I have seen a number of issues like this reported to the
> Gentoo bugzilla  over the last 2 years, and as you've seen, even bugs that get
> reported upstream seem to be mostly ignored. In comparison, we rarely see
> issues like this reported for ext2/ext3, I can only think of one infact which
> was a bug when the partition was larger than 3TB or something like that. So at
> least from a watching-the-Gentoo-bugzilla standpoint, I'd certainly recommend
> ext3.

It should be noted that the Gentoo Installation handbook recommends reiserfs rather highly.  It's "solid", and has "very good overall performance and greatly outperforms both ext2 and ext3 when dealing with small files."  See http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=4#doc_chap4.   The installation doc folks might want to take note of your observations, which paint a rather different picture.

Comment 8 Daniel Drake (RETIRED) gentoo-dev 2007-05-25 02:32:51 UTC
As I said above, that's only my personal view and I don't have any facts or true technical knowledge about filesystems to back it up with. I may consider mentioning it to them anyway though.