|Summary:||Race condition between writeback vs fallocate in ext4 extents code|
|Product:||Gentoo Linux||Reporter:||David Flogeras <dflogeras2>|
|Component:||[OLD] Core system||Assignee:||Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>|
|Package list:||Runtime testing required:||---|
Description David Flogeras 2012-10-24 20:26:39 UTC
As mentioned in https://bugs.gentoo.org/show_bug.cgi?id=439502 recent kernels have ext4 corruption and have been masked. I have also discovered that the 3.4.9 (which is the latest stable on amd64) contains a different bug as discussed https://bugs.gentoo.org/show_bug.cgi?id=439502 I have seen it happen multiple times on 3.4.9 while deleting large files (300+ GB) on ext4 filesystems. It has not happened since I upgraded to 3.4.11 and diffing fs/ext4/extents.c shows that a portion of the fix has been backported (uninitialized value). Furthermore, on x86 (upon which 3.3.8 is the latest stabled) there is a bug causing the symlinks in /proc/PID/fd/ to have the wrong permissions. I cannot find the exact checkin in GIT or a reference on LKML but I can verify that my procfs tests stopped failing once upgrading to the 3.4 series. Reproducible: Always
Comment 1 James Bowlin 2012-10-24 20:30:39 UTC
You give two links to the same bug.
Comment 2 David Flogeras 2012-10-24 20:36:12 UTC
So I did (duh!) I meant ... as discussed here https://lkml.org/lkml/2012/8/15/372 Sorry for the finger problems
Comment 3 Kerin Millar 2012-10-24 23:48:54 UTC
Assigning. This is fixed by commmit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream. Backported patches are currently available in the 3.4 and 3.6 stable queues. I haven't gone as far as to determine whether 3.2-longterm is affected. David, it looks as though gentoo-sources skipped 3.4 entirely in favour of stablilizing 3.5.7, as per bug 438798. Not that it counts for much as it is already EOL! Perhaps the best short term solution for gentoo-sources users would be a revision bump. In any case, if you wish to file a stabilization request, please do it independently of this bug. That way, this bug can be used to specifically track the resolution of this issue. If it depends on a stable request to be satisfactorily resolved, then the bug deps can be amended as appropriate.
Comment 4 David Flogeras 2012-10-24 23:58:14 UTC
Thanks for digging into this! So should I wait until a fixed ebuild is submitted? It sounds like it might end up being gentoo-sources-3.4.9-r1? In which case it is silly of me to pre-submit a stable request.
Comment 5 Kerin Millar 2012-10-25 00:09:18 UTC
Just as I wrote the preceding comment(In reply to comment #4) > Thanks for digging into this! So should I wait until a fixed ebuild is > submitted? It sounds like it might end up being gentoo-sources-3.4.9-r1? In > which case it is silly of me to pre-submit a stable request. Just as I wrote the previous comment, Mike cancelled 3.5.7 stabilization due to the severity of bug 439502 (which is urgent). There's no doubt that something needs to be done with the existing crop of stable keyworded kernels, so I would suggest waiting a little for the dust to settle. In the meantime, you can protect yourself from the race condition bug by applying the relevant patch from here directly to your source tree: https://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=tree;f=queue-3.4
Comment 6 Richard Yao (RETIRED) 2012-10-25 02:24:02 UTC
In the case of bug #439502, the upstream ext4 maintainer, Ted T'so, provided information that showed it to be a critical regression in the current stable trees. That is why I masked affected packages. Unfortunately, there are many subtle bugs in Linux filesystems and if we were to mask every kernel source package affected by an issue, all kernel source packages would be masked. This looks like something that should be included in the next revisions of various gentoo-sources packages in the event that the upstream stable maintainers fail to back port it, but it does not appear to be severe enough to merit immediate action.
Comment 7 Kerin Millar 2012-10-25 03:59:20 UTC
(In reply to comment #6) > In the case of bug #439502, the upstream ext4 maintainer, Ted T'so, provided > information that showed it to be a critical regression in the current stable > trees. That is why I masked affected packages. Unfortunately, there are many > subtle bugs in Linux filesystems and if we were to mask every kernel source > package affected by an issue, all kernel source packages would be masked. > > This looks like something that should be included in the next revisions of > various gentoo-sources packages in the event that the upstream stable > maintainers fail to back port it, but it does not appear to be severe enough > to merit immediate action. I quite agree. Thanks for taking the time to comment.
Comment 8 Tom Wijsman (TomWij) (RETIRED) 2013-03-06 11:00:28 UTC
> This is fixed by commmit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream. 3.0 appears affected. 3.2.39 OK. 3.4.34 OK. 3.4.35 OK. 3.5.7 appears affected. 3.6.11 OK. 3.7.10 OK. 3.8.1 OK. 3.8.2 OK. I'll backport this to 3.0.68 once it gets released, then drop any previous versions in that branch and file stabilization bugs for latest 3.0 and 3.2; I'll drop 3.5.7 from the tree at that point as well since it is not worth porting back the fix to that, we don't want to support versions that upstream doesn't. Anyone who disagrees with this approach has about a week to raise their hand.
Comment 9 Tom Wijsman (TomWij) (RETIRED) 2013-03-09 11:02:49 UTC
> 3.0 appears affected. Doesn't seem easy to backport the patch, it seems though similar fixes are already in place so I assume the upstream maintainers have already backported this themselves. It should be sufficient to just get rid of the older versions over time. I will file a stabilization bug for latest 3.0 / 3.2 kernels one of these days, then get rid of any previous versions once they are stabilized... > 3.5.7 appears affected. And as it is no longer listed upstream (therefore no longer supported), I'm masking it along 3.0.17-r2. # Tom Wijsman <TomWij@gentoo.org> (09 Mar 2013) # 3.0.17-r2 will be removed as it is over a year old, also removing # 3.5.7-r1 which branch is no longer listed upstream. Removal in 14 days. # If you still use one of these, I advice you to upgrade. Bug #439546. =sys-kernel/gentoo-sources-3.0.17-r2 =sys-kernel/gentoo-sources-3.5.7-r1