I have root filesystem on luks (aes-xts-plain) encrypted filesystem lying on md raid1. When I hibernate system to encrypted swap, after few hibernate/resume cycles root filesystem get corrupted while writing on it. Before, I have very same configuration except raid1 mirror as underlying rootfs device and it works just fine. Problems start right after i move rootfs to raid1 md device. Behavior is the same on sys-kernel/gentoo-sources-2.6.30-r5 and sys-kernel/gentoo-sources-2.6.31-r6 with both, ext3 and ext4. It mainly show when emerge try to install files to root filesystem, but its not exactly predictable. It needs several hibernate/resume cycles to show, but it show frequently, 2 or 3 times a month. In last case, i found over 120 files in /lost+found and my system unbootable. In some previous cases, it change some .so files in /lib64 to directories, or unknown file types, and system was usually unbootable. /sbin was not spared too. After fsck and some reemerges and/or restores, system was able operate normally. Usually, error firstly show itself, like emerge complain, it cannot rewrite some file. After it, usually cannot write in directory, where complained file resides. In dmesg is dozens of errors like this: EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13459(bit 5266 in group 1) EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13460(bit 5267 in group 1) EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13461(bit 5268 in group 1) (dm-0 is link to /dev/mapper/root which is luks decrypted device) Reproducible: Always Steps to Reproduce: 1. have luks encrypted rootfs on md raid1 1. hibernate and resume few times to luks encrypted swap 2. emerge some package with files in rootfs Actual Results: Corrupted rootfs Expected Results: Not corrupted rootfs Portage 2.1.6.13 (default/linux/amd64/10.0, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.31-gentoo-r6 x86_64) ================================================================= System uname: Linux-2.6.31-gentoo-r6-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_3800+-with-gentoo-2.0.1 Timestamp of tree: Thu, 10 Dec 2009 19:30:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 4.0_p35 dev-java/java-config: 2.1.9-r1 dev-lang/python: 2.6.4 dev-python/pycrypto: 2.0.1-r8 dev-util/ccache: 2.4-r7 dev-util/cmake: 2.6.4-r3 sys-apps/baselayout: 2.0.1 sys-apps/openrc: 0.5.3 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.13, 2.63-r1 sys-devel/automake: 1.7.9-r1, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O3 -march=k8 -funroll-loops -mfpmath=sse -msse -msse2 -mmmx -m3dnow -fomit-frame-pointer -fprefetch-loop-arrays -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /usr/share/xsessions" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O3 -march=k8 -funroll-loops -mfpmath=sse -msse -msse2 -mmmx -m3dnow -fomit-frame-pointer -fprefetch-loop-arrays -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://mirror.switch.ch/mirror/gentoo/ http://mirror.switch.ch/mirror/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo/" LANG="POSIX" LDFLAGS="-Wl,-O1" LINGUAS="cs en" MAKEOPTS="-j3" PKGDIR="/var/tmp/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="3dnow 3dnowext X a52 aac aalib acl acpi alsa amd64 apache2 apm asf audiofile bash-completion berkdb bzip2 caps cdparanoia cdrom cjk cli cracklib crypt cups curl cxx dbus dri dvd dvdread encode f77 ffmpeg flac fontconfig foomaticdb fortran fuse gdbm gif gimpprint gpm gstreamer gtk gtk2 iconv imagemagick imap imlib immqt-bc iproute2 jpeg jpeg2k kde lame libsamplerate lm_sensors lzo mad maildir matroska mjpeg mmx mmxext modules mp2 mp3 mpeg mplayer mudflap multilib ncurses network nls nptl nptlonly nsplugin nvidia objc ogg opengl openmp oss pam pcre pdf pdflib perl png ppds pppd python qt3support qt4 rar readline reflection samba sdl session slang sndfile spell spl sse sse2 ssl svg sysfs tcpd threads tiff truetype unicode usb vdpau vim-syntax vim-with-x vorbis wma wmf x264 xcomposite xinerama xorg xpm xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="cs en" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Created attachment 212658 [details] dmesg ext3
Created attachment 212659 [details] dmesg ext4
Well this is a major symptom and also very painful to track down. Could easily be a race condition that is only triggered by your whole setup of encrypted root + raid1 + encrypted swap + hibernate and maybe even your particular hardware since board-specific ACPI quirks can affect hibernate results. Not to mention filesystem-eating bugs are not very popular for other people to try to reproduce... How much trouble are you willing to go to in order to track this thing down? If a bug gets filed with the kernel devs and they start suggesting things to try, will you be able to do more experiments or would it be too disruptive to risk more filesystem corruption? (For instance, you might not have time to keep restoring your system, and would rather just skip the RAID1...)
I am willing help with resolve this bug. It's on my home computer and, although its painful, I can restore root from backup. If no other option last, I always can drop raid1.
are you using a customised initramfs to setup the crypt-swap on startup? From memory the major/minor codes on the cryptswap should be the same between suspend resume. To aid debugging start by trying to capture a lot more shutdown/startup info preferably on a new as possible vanilla kernel.
I'm using initrd generated by genkernel initrd with LVM, MDADM and LUKS set to "yes". I didn't any changes to initrd. grub.conf: kernel /boot/vmlinuz root=/dev/ram0 crypt_root=/dev/md1 real_root=/dev/mapper/root crypt_swap=/dev/sda2 swap_keydev=/dev/mapper/root swap_key=etc/keys/swap.key real_resume=/dev/mapper/swap acpi_enforce_resources=lax initrd /boot/initramfs Swap is encrypted by luks aes-xts-plain and decrypted by key, what lay on rootfs in /etc/keys/swap.key. I just install sys-kernel/vanilla-sources-2.6.32.4. Is it new enough? I will save dmesg after resumes. Should I get some other debug information that dmesg? System always resume from hibernation without problems (without errors in dmesg), but sometimes, after few minutes (usually while write activity on rootfs, like emerge -uD world) it remount / as ro (I have error behavior set to remount-ro) and report errors on filesystem. In about half cases it eat several files (it really likes /sbin and /lib64) and system is unable to boot.
Today, first rootfs corruption occurred since I run sys-kernel/vanilla-sources-2.6.32.4. System resumed OK, as usual, and I made backup of /. (I do this backup regularly.) After several hours, in dmesg ext4 errors appeared. dmesg and messages (with time stamps).
Created attachment 219035 [details] dmesg 2010-02-09
Created attachment 219037 [details] messages 2010-02-09 (cut)
Created attachment 224925 [details] dmesg 2010-03-22
Created attachment 224927 [details] messages 2010-03-22 (cut)
Nothing new with 2.6.32.8. This time, it eat /etc/profile.env. dmesg is the same as usual. Error came out when i run emerge -uD world and it wrote some files in /lib32.
This seems to be able to be reliably reproduced. Please file this upstream at http://bugzilla.kernel.org and copy the bug URL down here.