Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 878023 - sys-kernel/gentoo-sources-6.0.0 -6.0.2 btrfs: divide-by-zero on boot
Summary: sys-kernel/gentoo-sources-6.0.0 -6.0.2 btrfs: divide-by-zero on boot
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal blocker
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard: genpatches 6.0.7
Keywords: InVCS
Depends on:
Blocks:
 
Reported: 2022-10-22 20:30 UTC by John Bowler
Modified: 2022-11-03 16:20 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Bowler 2022-10-22 20:30:36 UTC
This is kernel bug https://bugzilla.kernel.org/show_bug.cgi?id=216559

I have verified that the documented problem exists on the three older (original) btrfs partitions.  One is the RAID partition I use for / so I have not been able to boot any of these kernels.

The bug manifests as a kernel crash immediately after the root partition is found when an attempt is made to mount it.  It's below a function called 'exclude_super_strips' called from 'btrfs_read_block_groups'; see the screen shots in the kernel report (they match my boot log).

There are commands to fix the file systems in the kernel log however note that the "btrfs balance" command apparently requires the file system to be mounted and the bug stops it being mounted; so the fix seems to be impossible on a running 6.0.x kernel.

A newer btrfs file system on the same machine does not have the sub_stripes==0 issue, so presumably this will only happen with existing file systems.

Reproducible: Always

Steps to Reproduce:
1. Obtain a system with a btrfs root file system with the problem (requires a pre-existing file system).
2. Attempt to boot 6.0.x
3.
Actual Results:  
Kernel panic

Expected Results:  
No kernel panic; the kernel should be handling the condition.  See the comments in the kernel bug.

The seriousness of the bug seems to be underestimated; it stops upgrade to kernel 6.0.x (and most likely 6.1.x).  The fix is obscure - this is why I'm entering a gentoo bug, so people can find the kernel bug and know what the fix is!  The fix seems to require an older kernel, so if the problem does happen after boot when a non-critical file system is mounted (e.g. a removable drive) it seems like it will be impossible to fix without reverting the kernel or moving the problem file system to an older system to be fixed.
Comment 1 John Bowler 2022-10-22 20:38:17 UTC
This bug might also happen in 6.0.3 but I haven't confirmed that and, since I'm running the work-round, I won't be able to.  emerge --info isn't possible for the two earlier versions because they have been removed.  kernel.org confirms the bug in 6.0.0
Comment 2 John Bowler 2022-10-23 01:53:34 UTC
The work round on kernel.org does seem to work, at least I am now able to boot 6.0.3
Comment 4 Mike Pagano gentoo-dev 2022-10-25 16:46:45 UTC
Upstream patch:

https://github.com/kdave/btrfs-devel/commit/64a32bdaa968ca6f4b20ae3a74b4a677aea2b210

John, if you test this patch and report a success, I can release a -r1 with it.

Or we can wait until it get's to the stable-queue for 6.0.X
Comment 5 John Bowler 2022-10-25 19:44:53 UTC
(In reply to Mike Pagano from comment #4)
> Upstream patch:
> 
> https://github.com/kdave/btrfs-devel/commit/
> 64a32bdaa968ca6f4b20ae3a74b4a677aea2b210
> 
> John, if you test this patch and report a success, I can release a -r1 with
> it.

I don't have the means to test it; I ran the btrfs balance on all three of the file systems I have that had sub_strips==0 items so everything boots (and mounts) fine without the patch.

I also don't know which mkfs.btrfs was producing the problem.  It may be entirely local to gentoo; @victor's kernel was built with genkernel (from the screenshots in the kernel.org bug).  I didn't get any response to how to determine the mkfs version so I assume it is not recorded in the file system that results.

If someone can find a btrfs (either RAID or single volume) with sub_stripes==0 that could be used as a test.  Preferably one on a non-boot partition or at least a partition that can be imaged.  The btrfs check command is in the kernel bug but it's pretty much:

btrfs ins dump-tree -t chunk <device> | egrep -s 'sub_stripes 0' && echo "we have a problem captain"
Comment 7 John Bowler 2022-11-01 17:14:57 UTC
From the kernel commit:

>It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa

It would help to have a date on that fix.  The short commit ID can't be used in a URI (the URI needs the full commit id and the search box doesn't allow a search on a commit).

Knowing the date tells us the youngest age of potentially affected file systems.
Comment 8 Mike Pagano gentoo-dev 2022-11-02 10:41:52 UTC
(In reply to John Bowler from comment #7)
> From the kernel commit:
> 
> >It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa
> 
> It would help to have a date on that fix.  The short commit ID can't be used
> in a URI (the URI needs the full commit id and the search box doesn't allow
> a search on a commit).
> 
> Knowing the date tells us the youngest age of potentially affected file
> systems.

Not exactly sure what you asking.

The full message indicates the fix is in v5.4 of btrfs-progs.

"It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa
("btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk")"
which is included in v5.4 btrfs-progs release.

We have 6.0 as the latest in the tree.
Comment 9 John Bowler 2022-11-02 18:03:34 UTC
(In reply to Mike Pagano from comment #8)
> (In reply to John Bowler from comment #7)
> > From the kernel commit:
> > 
> > >It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa
> > 
> > It would help to have a date on that fix.  The short commit ID can't be used
> > in a URI (the URI needs the full commit id and the search box doesn't allow
> > a search on a commit).
> > 
> > Knowing the date tells us the youngest age of potentially affected file
> > systems.
> 
> Not exactly sure what you asking.

The date.

It must be somewhere.  The date of the commit is an approximation; 5.4 was released after that commit (I assume) then dropped into ~ after that then into the non-dev version after that.  Knowing all those dates will help people with potentially infected file systems avoid during spurious checks, and it is easier to communicate a simple date ("your btrfs file system maybe damaged beyond repair in later kernels if it was built before <insert-date-here>").  I asked for the commit date because anyone with access to the kernel tree can get it just given the (truncated) commit id.

Relying on no one reintroducing the kernel code when we can fix the file systems doesn't seem advisable.
Comment 10 John Bowler 2022-11-02 18:18:23 UTC
Should have been fixed here:

https://btrfs.readthedocs.io/en/latest/CHANGES.html#btrfs-progs-5-4-2019-12-03

I.e. 2019-12-03, although the release notes do not make any mention of the fix and neither do the notes for 5.3.1

So any file system built before 2020 will have the problem (well, all my older file systems did) and will fail in unpatched 6.0 kernels.

The kernel patch (not the btrfs-progs one) was committed on 2022-10-25 and will apparently be in 6.1 (but not 6.0).  The patch is apparently part of 6.1-rc3 but not 6.1-rc2.

So: file system built by btrfs-progs prior to 2020 OR built by btrfs-progs<5.4 and kernel-6 <6.1-rc3 will show a problem.
Comment 11 Mike Pagano gentoo-dev 2022-11-02 18:20:01 UTC
Gentoo-sources and other kernels that use genpatches will have the fix on the next 6.0.X release
Comment 12 Larry the Git Cow gentoo-dev 2022-11-03 16:20:18 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f609d6942287bc1c4794fc878be89e984d709d36

commit f609d6942287bc1c4794fc878be89e984d709d36
Author:     Mike Pagano <mpagano@gentoo.org>
AuthorDate: 2022-11-03 16:19:28 +0000
Commit:     Mike Pagano <mpagano@gentoo.org>
CommitDate: 2022-11-03 16:19:28 +0000

    sys-kernel/gentoo-sources: add 6.0.7 and btrfs fix
    
    btrfs: don't use btrfs_chunk::sub_stripes from disk (Mike Pagano)
    
    Closes: https://bugs.gentoo.org/878023
    
    Signed-off-by: Mike Pagano <mpagano@gentoo.org>

 sys-kernel/gentoo-sources/Manifest                 |  3 +++
 .../gentoo-sources/gentoo-sources-6.0.7.ebuild     | 28 ++++++++++++++++++++++
 2 files changed, 31 insertions(+)