I've often have problem with ext4 corruption. It looks that it happens when ext4 FS has blocksize=1024 (I'm using it to keep portage tree). # LC_ALL=EN_US ls -lah /usr/portage/|grep lost ls: cannot access /usr/portage/lost+found: No such file or directory ls: cannot access /usr/portage/lost+found: No such file or directory d????????? ? ? ? ? ? lost+found d????????? ? ? ? ? ? lost+found From dmesg: [305658.001883] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622 [305658.010207] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622 [305658.018504] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622 [305658.030054] EXT4-fs error (device dm-1): ext4_lookup:1383: inode #243: comm rsync: deleted inode referenced: 7622 # tune2fs -l /dev/mapper/vg--mohikanin-portage tune2fs 1.42 (29-Nov-2011) Filesystem volume name: portage Last mounted on: /usr/portage Filesystem UUID: 8769d20b-aa52-4ad6-aed6-bc097fcd881a Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: journal_data_writeback user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 600064 Block count: 2097152 Reserved block count: 0 Free blocks: 1089246 Free inodes: 435308 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 256 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 2344 Inode blocks per group: 586 Flex block group size: 16 Filesystem created: Wed Jan 23 10:36:52 2013 Last mount time: Wed Mar 20 12:27:15 2013 Last write time: Wed Mar 20 12:28:22 2013 Mount count: 2 Maximum mount count: -1 Last checked: Sat Mar 16 23:34:08 2013 Check interval: 0 (<none>) Lifetime writes: 1942 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f3e60d43-a0dc-4d7e-8127-7ffc9a8bc6fe Journal backup: inode blocks FS Error count: 4 First error time: Wed Mar 20 12:28:22 2013 First error function: ext4_lookup First error line #: 1383 First error inode #: 243 First error block #: 0 Last error time: Wed Mar 20 12:28:22 2013 Last error function: ext4_lookup Last error line #: 1383 Last error inode #: 243 Last error block #: 0 Now it's time to fix it: # LC_ALL=en_US fsck.ext4 /dev/mapper/vg--mohikanin-portage e2fsck 1.42 (29-Nov-2011) portage contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry 'pgpool2-3.2.2.ebuild' in /dev-db/pgpool2 (243) has deleted/unused inode 7622. Clear<y>? yes Entry 'lost+found' in / (2) has deleted/unused inode 7622. Clear<y>? yes Entry 'lost+found' in / (2) has deleted/unused inode 7622. Clear<y>? yes Pass 3: Checking directory connectivity /lost+found not found. Create<y>? yes Pass 3A: Optimizing directories Duplicate entry 'lost+found' in / (2) found. Clear<y>? yes Entry 'lost+found' in / (2) has a non-unique filename. Rename to lost+found~0<y>? yes Entry 'lost+found' in / (2) has a non-unique filename. Rename to lost+found~0<y>? yes Duplicate entry 'lost+found~0' in / (2) found. Clear<y>? yes Pass 4: Checking reference counts Unattached inode 7622 Connect to /lost+found<y>? yes Pass 5: Checking group summary information portage: ***** FILE SYSTEM WAS MODIFIED ***** portage: 164757/600064 files (2.8% non-contiguous), 1007907/2097152 blocks Once is not enough: # LC_ALL=en_US fsck.ext4 /dev/mapper/vg--mohikanin-portage e2fsck 1.42 (29-Nov-2011) portage: clean, 164757/600064 files, 1007907/2097152 blocks # LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry '#7622' in /lost+found (7622) is a link to '.' Clear<y>? yes Entry 'lost+found' in / (2) is a link to directory /lost+found (7622). Clear<y>? yes Pass 3: Checking directory connectivity /lost+found not found. Create<y>? yes Pass 4: Checking reference counts Inode 7622 ref count is 1, should be 2. Fix<y>? yes Pass 5: Checking group summary information portage: ***** FILE SYSTEM WAS MODIFIED ***** portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks # LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry 'lost+found' in / (2) is a link to directory /lost+found (8135). Clear<y>? yes Entry 'lost+found' in / (2) is a link to directory /lost+found (8135). Clear<y>? yes Entry 'lost+found' in / (2) is a link to directory /lost+found (8135). Clear<y>? yes Pass 3: Checking directory connectivity Pass 3A: Optimizing directories Pass 4: Checking reference counts Pass 5: Checking group summary information portage: ***** FILE SYSTEM WAS MODIFIED ***** portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks # LC_ALL=en_US fsck.ext4 -f /dev/mapper/vg--mohikanin-portage e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information portage: 164758/600064 files (2.8% non-contiguous), 1007908/2097152 blocks So filesystem needs to run fsck three times to fix all errors. Reproducible: Always This host is qemu-kvm virtualized host. # emerge --info Portage 2.1.11.52 (hardened/linux/amd64, gcc-4.6.3, glibc-2.15-r3, 3.7.5-hardened-r1 x86_64) ================================================================= System uname: Linux-3.7.5-hardened-r1-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T7700_@_2.40GHz-with-gentoo-2.1 KiB Mem: 1019984 total, 289184 free KiB Swap: 524284 total, 524284 free Timestamp of tree: Unknown ld GNU gold (GNU Binutils 2.22) 1.11 ccache version 3.1.9 [enabled] app-shells/bash: 4.2_p37 dev-lang/python: 2.7.3-r2, 3.2.3 dev-util/ccache: 3.1.9 dev-util/pkgconfig: 0.28 sys-apps/baselayout: 2.1-r1 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.5 sys-devel/autoconf: 2.69 sys-devel/automake: 1.11.6 sys-devel/binutils: 2.22-r1 sys-devel/gcc: 4.6.3 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4-r1 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.6 (virtual/os-headers) sys-libs/glibc: 2.15-r3 Repositories: gentoo ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -march=native -frecord-gcc-switches -fno-unwind-tables -fno-asynchronous-unwind-tables -fpeel-loops -ftracer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/apache2-php5.4/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe -march=native -frecord-gcc-switches -fno-unwind-tables -fno-asynchronous-unwind-tables -fpeel-loops -ftracer" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs ccache collision-protect compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LANG="pl_PL.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--sort-common" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_EXTRA_OPTS="-O --inplace" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" USE="acl acpi amd64 bash-completion caps cli cracklib crypt cxx dri gpm hardened iconv idn ipv6 justify mmx mmxext modules mudflap multilib ncurses nls nptl openmp pax_kernel pcre readline session sse sse2 sse3 ssse3 threads unicode urandom vhosts vim-syntax xattr" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authz_host dir mime unique_id" APACHE2_MPMS="itk" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NGINX_MODULES_HTTP="access auth_basic browser charset fastcgi gzip gzip_static headers_more limit_conn limit_req proxy realip referer rewrite userid" PHP_TARGETS="php5-3 php5-4" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" USE_PYTHON="2.7 3.2" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS
You should not be having problems with a block size of 1024. Does this only happen with a hardened kernel? Have you tested with vanilla.
I've hardened-sources everywhere, this is why I put hardened-sources in title. I can reproduce problem with sys-kernel/vanilla-sources-3.7.10 as well. It looks I can reproduce problem in this way: - do some writes on /usr/portage - touch /forcefsck - init 6 After one, two tries I can see: # ls -lah /usr/portage/|grep lost drwx------ 2 root root 1,0K 03-21 21:16 lost+found drwx------ 2 root root 1,0K 03-21 21:16 lost+found drwx------ 2 root root 1,0K 03-21 21:16 lost+found drwx------ 2 root root 1,0K 03-21 21:16 lost+found Now I've got to run fsck manually. I suspect this is related to block size because I don't have such problem on other filesystems formated with default block size. I've prepared image of damaged filesystem (using e2image -r), after bziping it has ~6MB size. If it could be useful I can upload this file.
(In reply to comment #2) > I've hardened-sources everywhere, this is why I put hardened-sources in > title. I can reproduce problem with sys-kernel/vanilla-sources-3.7.10 as > well. > It looks I can reproduce problem in this way: > - do some writes on /usr/portage > - touch /forcefsck > - init 6 > > After one, two tries I can see: > # ls -lah /usr/portage/|grep lost > drwx------ 2 root root 1,0K 03-21 21:16 lost+found > drwx------ 2 root root 1,0K 03-21 21:16 lost+found > drwx------ 2 root root 1,0K 03-21 21:16 lost+found > drwx------ 2 root root 1,0K 03-21 21:16 lost+found > > Now I've got to run fsck manually. I suspect this is related to block size > because I don't have such problem on other filesystems formated with default > block size. > I've prepared image of damaged filesystem (using e2image -r), after bziping > it has ~6MB size. If it could be useful I can upload this file. at this point i'm beginning to think you have a hardware issue. you can try using smartctl to identify where the issue is.
Mayby this is hardware problem, maybe not. I can't confirm any option. As I mentioned earlier I'm doing all test in virtualized enviroment. Host has ECC ram with SAS disks (raid1), host is running on hardened-sources-3.8.0 with qemu-1.2.2-r3. I just did some test trying to narrow steps to reproduce. I created 3 LV: ACTIVE '/dev/vg-rup/1k' [5,00 GiB] inherit ACTIVE '/dev/vg-rup/2k' [5,00 GiB] inherit ACTIVE '/dev/vg-rup/4k' [5,00 GiB] inherit I created ext4 on each of them with block size 1k,2k,4k. I've mounted filesystems on /dane/1k, /dane/2k and /dane/4k. Next I did: PORTDIR=/dane/1k/ emerge --sync ; PORTDIR=/dane/2k/ emerge --sync ; PORTDIR=/dane/4k/ emerge --sync then: touch /forcefsck && init 6 After reboot I have: ls -lah /dane/1k/|grep lost drwx------ 2 root root 1,0K 03-22 10:32 lost+found drwx------ 2 root root 1,0K 03-22 10:32 lost+found drwx------ 2 root root 1,0K 03-22 10:32 lost+found drwx------ 2 root root 1,0K 03-22 10:32 lost+found rup dane # ls -lah /dane/2k/|grep lost drwx------ 2 root root 2,0K 03-22 10:32 lost+found drwx------ 2 root root 2,0K 03-22 10:32 lost+found rup dane # ls -lah /dane/4k/|grep lost drwx------ 2 root root 4,0K 03-22 10:32 lost+found This is very interesting, 4 "lost+find" dirs on 1k fs, and 2 "lost+found" on 2k fs. Now it's time to do fsck: # LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/1k e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry 'lost+found' in / (2) is a link to directory /lost+found (3434). Clear? yes Entry 'lost+found' in / (2) is a link to directory /lost+found (3434). Clear? yes Entry 'lost+found' in / (2) is a link to directory /lost+found (3434). Clear? yes Pass 3: Checking directory connectivity Pass 3A: Optimizing directories Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vg-rup/1k: ***** FILE SYSTEM WAS MODIFIED ***** /dev/vg-rup/1k: 164449/327680 files (0.2% non-contiguous), 490403/5242880 blocks rup dane # LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/2k e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry 'lost+found' in / (2) is a link to directory /lost+found (16578). Clear? yes Pass 3: Checking directory connectivity Pass 3A: Optimizing directories Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vg-rup/2k: ***** FILE SYSTEM WAS MODIFIED ***** /dev/vg-rup/2k: 164449/327680 files (0.2% non-contiguous), 324604/2621440 blocks rup dane # LC_ALL=en_US fsck.ext4 -fy /dev/vg-rup/4k e2fsck 1.42 (29-Nov-2011) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vg-rup/4k: 164449/327680 files (0.0% non-contiguous), 247199/1310720 blocks It seems the number of errors on filesystem is corelated with size of block. Meseems it's important to create many files on filesystem and delete "lost+found" directory (like emerge --sync does).
I can reproduce it in simpler way: emerge --sync umount /dane/2k fsck -f -C0 -T -t -A -p /dev/vg-rup/1k mount /dane/2k # ls -lah /dane/1k/|grep lost drwx------ 2 root root 1,0K 03-22 11:10 lost+found drwx------ 2 root root 1,0K 03-22 11:10 lost+found drwx------ 2 root root 1,0K 03-22 11:10 lost+found drwx------ 2 root root 1,0K 03-22 11:10 lost+found Using above commands I reproduced problem on bare metal host with gentoo-sources-3.6.11.
(In reply to comment #5) > I can reproduce it in simpler way: > emerge --sync > umount /dane/2k > fsck -f -C0 -T -t -A -p /dev/vg-rup/1k > mount /dane/2k > # ls -lah /dane/1k/|grep lost > drwx------ 2 root root 1,0K 03-22 11:10 lost+found > drwx------ 2 root root 1,0K 03-22 11:10 lost+found > drwx------ 2 root root 1,0K 03-22 11:10 lost+found > drwx------ 2 root root 1,0K 03-22 11:10 lost+found > > Using above commands I reproduced problem on bare metal host with > gentoo-sources-3.6.11. Thanks, this is totally interesting! I'm going to try to reproduce later.
There is error in test pasted earlier (I did `mount /dane/2k` but did `ls /dane/1k`) but all rest is true. Corruption appears after fscking. It looks e2fsprogs corrupts filesystem. I can see in e2fsprogs's changelog many entries about fsck corrupts filesystem, probably I hit one of them. Meseems this isn't related to kernel. With e2fsprogs-1.42.7 I couldn't reproduce corruption. I'm voting for faster stabilisation of e2fsprogs:)
newer version is stable now