Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 179563
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Lindsay Haisley <fmouse-gentoo@fmp.com>
Add CC:
CC:
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
reiserfs_panic.txt Log extract for reiserfs kerenl panic. text/plain Lindsay Haisley 2007-05-23 18:33 0000 7.64 KB Details
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 179563 depends on: Show dependency tree
Bug 179563 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2007-05-23 18:31 0000
About a week ago one of two servers that are the foundation of my business
crashed on a kernel panic with substantial reiserfs filesystem corruption. 
This system had run flawlessly without a reboot for about 9 months using kernel
2.6.17-gentoo-r7.  This crash occurred a couple of weeks after upgrading to
kernel 2.6.20-gentoo-r7.

The box is a dual-Xeon system with a 3ware RAID controller 8506-4LP running in
RAID-5 mode.  The controller's log shows no errors.  The kernel was compiled
with gcc x86_64-pc-linux-gnu-4.1.1 (Gentoo 4.1.1-r3).

I find a couple of other references to a similar error at
http://lkml.org/lkml/2006/10/17/360, http://lkml.org/lkml/2006/11/3/82 and
http://crypto.riken.go.jp/pub/linux/kernel.org/kernel/v2.6/snapshots/patch-2.6.19-git13.log
including several references to a "reiserfs bad path release panic".

My understanding is that development of reiserfs kernel support and tools will
be frozen at current levels going forward, and that the ext3 journaling
filesystem will be recommended.  Is this reiserfs bug one that's known to
Gentoo kernel devs which may be addressed in the future?  Would I be well
advised to transition the affected systems to ext3 filesystems?  I've backed
off the kernel upgrade of 3 weeks ago.  Stability is of paramount importance on
these machines.

I'm posting the log of the crash separately.

------- Comment #1 From Lindsay Haisley 2007-05-23 18:33:42 0000 -------
Created an attachment (id=120114) [details]
Log extract for reiserfs kerenl panic.

------- Comment #2 From Nick Loeve 2007-05-23 20:30:21 0000 -------
Are you able to try and reproduce this error with gentoo-sources-2.6.21-r1
which is on the testing branch currently? 

------- Comment #3 From Lindsay Haisley 2007-05-24 02:21:38 0000 -------
This is a production machine hosting web and email resources for paying
customers.  The crash happened after a couple of weeks of flawless operation,
apparently during a pop3 fetch from a customer with the system under moderate
load.  The resulting filesystem corruption was severe enough to require me to
use --rebuild-tree with fsck.reiserfs and a small number of customer emails
were lost - fortunately, and apparently, only some spam.  On top of that the
box is co-located about 30 miles away and recovery involves rousting my
(luckily very good-natured) hosting provider out of bed or away from his family
if it happens at night to come and give me access to the box.

In short, I'm really not comfortable trying to reproduce the crash under any
circumstances, although if you tell me that the crash log appears to be the
result of a known bug that's been explicitly addressed between 2.6.20-gentoo-r7
and 2.6.21-gentoo-r1 then I'll be happy to build and install this kernel and
see what happens.

------- Comment #4 From Nick Loeve 2007-05-24 20:22:42 0000 -------
Hi,

There was a patch in 2.6.21 with the following commit message:

    [PATCH] reiserfs: fix key decrementing

    This patch fixes a bug in function decrementing a key of stat data item.

    Offset of reiserfs keys are compared as signed values.  To set key offset
    to maximal possible value maximal signed value has to be used.

    This bug is responsible for severe reiserfs filesystem corruption which
    shows itself as warning vs-13060.  reiserfsck fixes this corruption by
    filesystem tree rebuilding.


This could well be the fix for the panic you experienced, based on the top most
errors in the log you have attached.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6d205f120547043de663315698dcf5f0eaa31b5c

------- Comment #5 From Lindsay Haisley 2007-05-24 21:53:33 0000 -------
OK, this looks like it probably fixes, or at least addresses the problem.  I'll
build this kernel and install it, and let you know how things go.

------- Comment #6 From Daniel Drake 2007-05-24 23:28:11 0000 -------
(In reply to comment #0)
> I find a couple of other references to a similar error at
> http://lkml.org/lkml/2006/10/17/360, http://lkml.org/lkml/2006/11/3/82 and
> http://crypto.riken.go.jp/pub/linux/kernel.org/kernel/v2.6/snapshots/patch-2.6.19-git13.log
> including several references to a "reiserfs bad path release panic".

They appear to be unrelated to your crash.

> My understanding is that development of reiserfs kernel support and tools will
> be frozen at current levels going forward, and that the ext3 journaling
> filesystem will be recommended.  Is this reiserfs bug one that's known to
> Gentoo kernel devs which may be addressed in the future?  Would I be well
> advised to transition the affected systems to ext3 filesystems?  I've backed
> off the kernel upgrade of 3 weeks ago.  Stability is of paramount importance on
> these machines.

reiserfs was dead from pretty much the point of kernel inclusion - the
developers argued heavily for inclusion in the kernel before it had been
heavily tested and fully reviewed, and then after inclusion when people did
point out bugs and other issues, nobody did anything. So yes, things are
relatively frozen and always have been.

I've never used reiserfs for any extended period of time and I don't know
anything about it's internals, so I can't accurately comment on its
reliability. However I have seen a number of issues like this reported to the
Gentoo bugzilla  over the last 2 years, and as you've seen, even bugs that get
reported upstream seem to be mostly ignored. In comparison, we rarely see
issues like this reported for ext2/ext3, I can only think of one infact which
was a bug when the partition was larger than 3TB or something like that. So at
least from a watching-the-Gentoo-bugzilla standpoint, I'd certainly recommend
ext3.

Nick: nice find, that seems very likely to be a fix for this issue. I've added
this patch to the 2.6.20 tree, but I'm not sure if there will be further 2.6.20
releases. 2.6.21 is already fixed as you have noted.

Given that this bug would be hard to reproduce even on a buggy kernel, and that
the fix is already available in 2.6.21, I'm going to close this bug. Feel free
to reopen it if any further information becomes available.

------- Comment #7 From Lindsay Haisley 2007-05-25 00:38:19 0000 -------
(In reply to comment #6)
> (In reply to comment #0)
> I've never used reiserfs for any extended period of time and I don't know
> anything about it's internals, so I can't accurately comment on its
> reliability. However I have seen a number of issues like this reported to the
> Gentoo bugzilla  over the last 2 years, and as you've seen, even bugs that get
> reported upstream seem to be mostly ignored. In comparison, we rarely see
> issues like this reported for ext2/ext3, I can only think of one infact which
> was a bug when the partition was larger than 3TB or something like that. So at
> least from a watching-the-Gentoo-bugzilla standpoint, I'd certainly recommend
> ext3.

It should be noted that the Gentoo Installation handbook recommends reiserfs
rather highly.  It's "solid", and has "very good overall performance and
greatly outperforms both ext2 and ext3 when dealing with small files."  See
http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=4#doc_chap4.
  The installation doc folks might want to take note of your observations,
which paint a rather different picture.

------- Comment #8 From Daniel Drake 2007-05-25 02:32:51 0000 -------
As I said above, that's only my personal view and I don't have any facts or
true technical knowledge about filesystems to back it up with. I may consider
mentioning it to them anyway though.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug