Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 462470 - sys-fs/e2fsprogs-1.42 fsck corrupts ext4 filesystem, especially with block size smaller than 4096 bytes
Summary: sys-fs/e2fsprogs-1.42 fsck corrupts ext4 filesystem, especially with block si...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on: 467008
Blocks:
  Show dependency tree
 
Reported: 2013-03-20 11:38 UTC by Marcin Mirosław
Modified: 2013-12-03 08:26 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Mirosław 2013-03-20 11:38:48 UTC
I've often have problem with ext4 corruption. It looks that it happens when ext4 FS has blocksize=1024 (I'm using it to keep portage tree).

# LC_ALL=EN_US ls -lah /usr/portage/|grep lost
ls: cannot access /usr/portage/lost+found: No such file or directory
ls: cannot access /usr/portage/lost+found: No such file or directory
d?????????    ? ?       ?          ?            ? lost+found
d?????????    ? ?       ?          ?            ? lost+found

From dmesg:
[305658.001883] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622
[305658.010207] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622
[305658.018504] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622
[305658.030054] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622

# tune2fs -l /dev/mapper/vg--mohikanin-portage
tune2fs 1.42 (29-Nov-2011)
Filesystem volume name:   portage
Last mounted on:          /usr/portage
Filesystem UUID:          8769d20b-aa52-4ad6-aed6-bc097fcd881a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    journal_data_writeback user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              600064
Block count:              2097152
Reserved block count:     0
Free blocks:              1089246
Free inodes:              435308
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      256
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         2344
Inode blocks per group:   586
Flex block group size:    16
Filesystem created:       Wed Jan 23 10:36:52 2013
Last mount time:          Wed Mar 20 12:27:15 2013
Last write time:          Wed Mar 20 12:28:22 2013
Mount count:              2
Maximum mount count:      -1
Last checked:             Sat Mar 16 23:34:08 2013
Check interval:           0 (<none>)
Lifetime writes:          1942 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f3e60d43-a0dc-4d7e-8127-7ffc9a8bc6fe
Journal backup:           inode blocks
FS Error count:           4
First error time:         Wed Mar 20 12:28:22 2013
First error function:     ext4_lookup
First error line #:       1383
First error inode #:      243
First error block #:      0
Last error time:          Wed Mar 20 12:28:22 2013
Last error function:      ext4_lookup
Last error line #:        1383
Last error inode #:       243
Last error block #:       0


Now it's time to fix it:
# LC_ALL=en_US fsck.ext4  /dev/mapper/vg--mohikanin-portage
e2fsck 1.42 (29-Nov-2011)
portage contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'pgpool2-3.2.2.ebuild' in /dev-db/pgpool2 (243) has deleted/unused inode 7622.  Clear<y>? yes

Entry 'lost+found' in / (2) has deleted/unused inode 7622.  Clear<y>? yes

Entry 'lost+found' in / (2) has deleted/unused inode 7622.  Clear<y>? yes

Pass 3: Checking directory connectivity
/lost+found not found.  Create<y>? yes

Pass 3A: Optimizing directories
Duplicate entry 'lost+found' in / (2) found.  Clear<y>? yes

Entry 'lost+found' in / (2) has a non-unique filename.
Rename to lost+found~0<y>? yes

Entry 'lost+found' in / (2) has a non-unique filename.
Rename to lost+found~0<y>? yes

Duplicate entry 'lost+found~0' in / (2) found.  Clear<y>? yes

Pass 4: Checking reference counts
Unattached inode 7622
Connect to /lost+found<y>? yes

Pass 5: Checking group summary information

portage: ***** FILE SYSTEM WAS MODIFIED *****
portage: 164757/600064 files (2.8% non-contiguous), 1007907/2097152 blocks

Once is not enough:
# LC_ALL=en_US fsck.ext4  /dev/mapper/vg--mohikanin-portage
e2fsck 1.42 (29-Nov-2011)
portage: clean, 164757/600064 files, 1007907/2097152 blocks
# LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry '#7622' in /lost+found (7622) is a link to '.'  Clear<y>? yes

Entry 'lost+found' in / (2) is a link to directory /lost+found (7622).
Clear<y>? yes

Pass 3: Checking directory connectivity
/lost+found not found.  Create<y>? yes

Pass 4: Checking reference counts
Inode 7622 ref count is 1, should be 2.  Fix<y>? yes

Pass 5: Checking group summary information

portage: ***** FILE SYSTEM WAS MODIFIED *****
portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks

# LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'lost+found' in / (2) is a link to directory /lost+found (8135).
Clear<y>? yes

Entry 'lost+found' in / (2) is a link to directory /lost+found (8135).
Clear<y>? yes

Entry 'lost+found' in / (2) is a link to directory /lost+found (8135).
Clear<y>? yes

Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

portage: ***** FILE SYSTEM WAS MODIFIED *****
portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks

# LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks

So filesystem needs to run fsck three times to fix all errors.

Reproducible: Always




This host is qemu-kvm virtualized host.

# emerge --info
Portage 2.1.11.52 (hardened/linux/amd64, gcc-4.6.3, glibc-2.15-r3, 3.7.5-hardened-r1 x86_64)
=================================================================
System uname: Linux-3.7.5-hardened-r1-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T7700_@_2.40GHz-with-gentoo-2.1
KiB Mem:     1019984 total,    289184 free
KiB Swap:     524284 total,    524284 free
Timestamp of tree: Unknown
ld GNU gold (GNU Binutils 2.22) 1.11
ccache version 3.1.9 [enabled]
app-shells/bash:          4.2_p37
dev-lang/python:          2.7.3-r2, 3.2.3
dev-util/ccache:          3.1.9
dev-util/pkgconfig:       0.28
sys-apps/baselayout:      2.1-r1
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.69
sys-devel/automake:       1.11.6
sys-devel/binutils:       2.22-r1
sys-devel/gcc:            4.6.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.6 (virtual/os-headers)
sys-libs/glibc:           2.15-r3
Repositories: gentoo
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native -frecord-gcc-switches   -fno-unwind-tables -fno-asynchronous-unwind-tables -fpeel-loops         -ftracer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/apache2-php5.4/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native -frecord-gcc-switches         -fno-unwind-tables -fno-asynchronous-unwind-tables -fpeel-loops         -ftracer"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs ccache collision-protect compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="pl_PL.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--sort-common"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="-O --inplace"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
USE="acl acpi amd64 bash-completion caps cli cracklib crypt cxx dri gpm hardened iconv idn ipv6 justify mmx mmxext modules mudflap multilib ncurses nls nptl openmp pax_kernel pcre readline session sse sse2 sse3 ssse3 threads unicode urandom vhosts vim-syntax xattr" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authz_host dir mime unique_id" APACHE2_MPMS="itk" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NGINX_MODULES_HTTP="access auth_basic browser charset fastcgi gzip gzip_static headers_more limit_conn limit_req proxy realip referer rewrite userid" PHP_TARGETS="php5-3 php5-4" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
USE_PYTHON="2.7 3.2"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS
Comment 1 Anthony Basile gentoo-dev 2013-03-21 17:15:49 UTC
You should not be having problems with a block size of 1024.  Does this only happen with a hardened kernel?  Have you tested with vanilla.
Comment 2 Marcin Mirosław 2013-03-21 20:22:55 UTC
I've hardened-sources everywhere, this is why I put hardened-sources in title. I can reproduce problem with sys-kernel/vanilla-sources-3.7.10 as well.
It looks I can reproduce problem in this way:
- do some writes on /usr/portage
- touch /forcefsck
- init 6

After one, two tries I can see:
# ls -lah /usr/portage/|grep lost
drwx------    2 root    root    1,0K 03-21 21:16 lost+found
drwx------    2 root    root    1,0K 03-21 21:16 lost+found
drwx------    2 root    root    1,0K 03-21 21:16 lost+found
drwx------    2 root    root    1,0K 03-21 21:16 lost+found

Now I've got to run fsck manually. I suspect this is related to block size because I don't have such problem on other filesystems formated with default block size.
I've prepared image of damaged filesystem (using e2image -r), after bziping it has ~6MB size. If it could be useful I can upload this file.
Comment 3 Anthony Basile gentoo-dev 2013-03-21 23:40:51 UTC
(In reply to comment #2)
> I've hardened-sources everywhere, this is why I put hardened-sources in
> title. I can reproduce problem with sys-kernel/vanilla-sources-3.7.10 as
> well.
> It looks I can reproduce problem in this way:
> - do some writes on /usr/portage
> - touch /forcefsck
> - init 6
> 
> After one, two tries I can see:
> # ls -lah /usr/portage/|grep lost
> drwx------    2 root    root    1,0K 03-21 21:16 lost+found
> drwx------    2 root    root    1,0K 03-21 21:16 lost+found
> drwx------    2 root    root    1,0K 03-21 21:16 lost+found
> drwx------    2 root    root    1,0K 03-21 21:16 lost+found
> 
> Now I've got to run fsck manually. I suspect this is related to block size
> because I don't have such problem on other filesystems formated with default
> block size.
> I've prepared image of damaged filesystem (using e2image -r), after bziping
> it has ~6MB size. If it could be useful I can upload this file.

at this point i'm beginning to think you have a hardware issue.  you can try using smartctl to identify where the issue is.
Comment 4 Marcin Mirosław 2013-03-22 09:44:49 UTC
Mayby this is hardware problem, maybe not. I can't confirm any option. As I mentioned earlier I'm doing all test in virtualized enviroment. Host has ECC ram with SAS disks (raid1), host is running on hardened-sources-3.8.0 with qemu-1.2.2-r3.
I just did some test trying to narrow steps to reproduce. I created 3 LV:
  ACTIVE            '/dev/vg-rup/1k' [5,00 GiB] inherit
  ACTIVE            '/dev/vg-rup/2k' [5,00 GiB] inherit
  ACTIVE            '/dev/vg-rup/4k' [5,00 GiB] inherit

I created ext4 on each of them with block size 1k,2k,4k. I've mounted filesystems on /dane/1k, /dane/2k and /dane/4k. Next I did:
PORTDIR=/dane/1k/ emerge --sync ; PORTDIR=/dane/2k/ emerge --sync ; PORTDIR=/dane/4k/ emerge --sync
then: touch /forcefsck && init 6
After reboot I have:
ls -lah /dane/1k/|grep lost
drwx------    2 root root 1,0K 03-22 10:32 lost+found
drwx------    2 root root 1,0K 03-22 10:32 lost+found
drwx------    2 root root 1,0K 03-22 10:32 lost+found
drwx------    2 root root 1,0K 03-22 10:32 lost+found
rup dane # ls -lah /dane/2k/|grep lost
drwx------    2 root root 2,0K 03-22 10:32 lost+found
drwx------    2 root root 2,0K 03-22 10:32 lost+found
rup dane # ls -lah /dane/4k/|grep lost
drwx------    2 root root 4,0K 03-22 10:32 lost+found

This is very interesting, 4 "lost+find" dirs on 1k fs, and 2 "lost+found" on 2k fs. Now it's time to do fsck:
# LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/1k
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'lost+found' in / (2) is a link to directory /lost+found (3434).
Clear? yes

Entry 'lost+found' in / (2) is a link to directory /lost+found (3434).
Clear? yes

Entry 'lost+found' in / (2) is a link to directory /lost+found (3434).
Clear? yes

Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vg-rup/1k: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg-rup/1k: 164449/327680 files (0.2% non-contiguous), 490403/5242880 blocks
rup dane # LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/2k
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'lost+found' in / (2) is a link to directory /lost+found (16578).
Clear? yes

Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vg-rup/2k: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg-rup/2k: 164449/327680 files (0.2% non-contiguous), 324604/2621440 blocks
rup dane # LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/4k
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vg-rup/4k: 164449/327680 files (0.0% non-contiguous), 247199/1310720 blocks

It seems the number of errors on filesystem is corelated with size of block.
Meseems it's important to create many files on filesystem and delete "lost+found" directory (like emerge --sync does).
Comment 5 Marcin Mirosław 2013-03-22 10:30:32 UTC
I can reproduce it in simpler way:
emerge --sync
umount /dane/2k
fsck -f -C0 -T -t -A -p /dev/vg-rup/1k
mount /dane/2k
# ls -lah /dane/1k/|grep lost
drwx------    2 root root 1,0K 03-22 11:10 lost+found
drwx------    2 root root 1,0K 03-22 11:10 lost+found
drwx------    2 root root 1,0K 03-22 11:10 lost+found
drwx------    2 root root 1,0K 03-22 11:10 lost+found

Using above commands I reproduced problem on bare metal host with gentoo-sources-3.6.11.
Comment 6 Anthony Basile gentoo-dev 2013-03-22 11:56:05 UTC
(In reply to comment #5)
> I can reproduce it in simpler way:
> emerge --sync
> umount /dane/2k
> fsck -f -C0 -T -t -A -p /dev/vg-rup/1k
> mount /dane/2k
> # ls -lah /dane/1k/|grep lost
> drwx------    2 root root 1,0K 03-22 11:10 lost+found
> drwx------    2 root root 1,0K 03-22 11:10 lost+found
> drwx------    2 root root 1,0K 03-22 11:10 lost+found
> drwx------    2 root root 1,0K 03-22 11:10 lost+found
> 
> Using above commands I reproduced problem on bare metal host with
> gentoo-sources-3.6.11.

Thanks, this is totally interesting!  I'm going to try to reproduce later.
Comment 7 Marcin Mirosław 2013-03-22 12:54:54 UTC
There is error in test pasted earlier (I did `mount /dane/2k` but did `ls /dane/1k`) but all rest is true. Corruption appears after fscking. It looks e2fsprogs corrupts filesystem. I can see in e2fsprogs's changelog many entries about fsck corrupts filesystem, probably I hit one of them. Meseems this isn't related to kernel.
With e2fsprogs-1.42.7 I couldn't reproduce corruption.
I'm voting for faster stabilisation of e2fsprogs:)
Comment 8 SpanKY gentoo-dev 2013-12-03 08:26:22 UTC
newer version is stable now