Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 296472 - encrypted rootfs on MD raid1 + hibernation to encrypted swap = corrupted rootfs and unbootable system
Summary: encrypted rootfs on MD raid1 + hibernation to encrypted swap = corrupted root...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-10 23:21 UTC by Petr Lanc
Modified: 2010-04-25 18:46 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg ext3 (dmesg-ext3.txt,30.89 KB, text/plain)
2009-12-10 23:25 UTC, Petr Lanc
Details
dmesg ext4 (dmesg-ext4.txt,31.09 KB, text/plain)
2009-12-10 23:25 UTC, Petr Lanc
Details
dmesg 2010-02-09 (dmesg-100209.txt,30.15 KB, text/plain)
2010-02-09 23:04 UTC, Petr Lanc
Details
messages 2010-02-09 (cut) (messages-100909.txt,51.32 KB, text/plain)
2010-02-09 23:05 UTC, Petr Lanc
Details
dmesg 2010-03-22 (dmesg-100322.txt,30.18 KB, text/plain)
2010-03-23 18:02 UTC, Petr Lanc
Details
messages 2010-03-22 (cut) (messages-100322.txt,9.27 KB, text/plain)
2010-03-23 18:02 UTC, Petr Lanc
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Lanc 2009-12-10 23:21:51 UTC
I have root filesystem on luks (aes-xts-plain) encrypted filesystem lying on md raid1. When I hibernate system to encrypted swap, after few hibernate/resume cycles root filesystem get corrupted while writing on it.

Before, I have very same configuration except raid1 mirror as underlying rootfs device and it works just fine. Problems start right after i move rootfs to raid1 md device.

Behavior is the same on sys-kernel/gentoo-sources-2.6.30-r5 and sys-kernel/gentoo-sources-2.6.31-r6 with both, ext3 and ext4.

It mainly show when emerge try to install files to root filesystem, but its not exactly predictable. It needs several hibernate/resume cycles to show, but it show frequently, 2 or 3 times a month.

In last case, i found over 120 files in /lost+found and my system unbootable. In some previous cases, it change some .so files in /lib64 to directories, or unknown file types, and system was usually unbootable. /sbin was not spared too. After fsck and some reemerges and/or restores, system was able operate normally.

Usually, error firstly show itself, like emerge complain, it cannot rewrite some file. After it, usually cannot write in directory, where complained file resides. In dmesg is dozens of errors like this:
 
EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13459(bit 5266 in group 1)
EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13460(bit 5267 in group 1)
EXT4-fs error (device dm-0): mb_free_blocks: double-free of inode 0's block 13461(bit 5268 in group 1)

(dm-0 is link to /dev/mapper/root which is luks decrypted device)

Reproducible: Always

Steps to Reproduce:
1. have luks encrypted rootfs on md raid1
1. hibernate and resume few times to luks encrypted swap
2. emerge some package with files in rootfs


Actual Results:  
Corrupted rootfs

Expected Results:  
Not corrupted rootfs

Portage 2.1.6.13 (default/linux/amd64/10.0, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.31-gentoo-r6 x86_64)
=================================================================                                      
System uname: Linux-2.6.31-gentoo-r6-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_3800+-with-gentoo-2.0.1
Timestamp of tree: Thu, 10 Dec 2009 19:30:01 +0000                                                          
ccache version 2.4 [enabled]                                                                                
app-shells/bash:     4.0_p35                                                                                
dev-java/java-config: 2.1.9-r1                                                                              
dev-lang/python:     2.6.4                                                                                  
dev-python/pycrypto: 2.0.1-r8
dev-util/ccache:     2.4-r7
dev-util/cmake:      2.6.4-r3
sys-apps/baselayout: 2.0.1
sys-apps/openrc:     0.5.3
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.13, 2.63-r1
sys-devel/automake:  1.7.9-r1, 1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6b
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O3 -march=k8 -funroll-loops -mfpmath=sse -msse -msse2 -mmmx -m3dnow -fomit-frame-pointer -fprefetch-loop-arrays -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /usr/share/xsessions"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-O3 -march=k8 -funroll-loops -mfpmath=sse -msse -msse2 -mmmx -m3dnow -fomit-frame-pointer -fprefetch-loop-arrays -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://mirror.switch.ch/mirror/gentoo/ http://mirror.switch.ch/mirror/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo/"
LANG="POSIX"
LDFLAGS="-Wl,-O1"
LINGUAS="cs en"
MAKEOPTS="-j3"
PKGDIR="/var/tmp/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X a52 aac aalib acl acpi alsa amd64 apache2 apm asf audiofile bash-completion berkdb bzip2 caps cdparanoia cdrom cjk cli cracklib crypt cups curl cxx dbus dri dvd dvdread encode f77 ffmpeg flac fontconfig foomaticdb fortran fuse gdbm gif gimpprint gpm gstreamer gtk gtk2 iconv imagemagick imap imlib immqt-bc iproute2 jpeg jpeg2k kde lame libsamplerate lm_sensors lzo mad maildir matroska mjpeg mmx mmxext modules mp2 mp3 mpeg mplayer mudflap multilib ncurses network nls nptl nptlonly nsplugin nvidia objc ogg opengl openmp oss pam pcre pdf pdflib perl png ppds pppd python qt3support qt4 rar readline reflection samba sdl session slang sndfile spell spl sse sse2 ssl svg sysfs tcpd threads tiff truetype unicode usb vdpau vim-syntax vim-with-x vorbis wma wmf x264 xcomposite xinerama xorg xpm xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="cs en" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Petr Lanc 2009-12-10 23:25:06 UTC
Created attachment 212658 [details]
dmesg ext3
Comment 2 Petr Lanc 2009-12-10 23:25:30 UTC
Created attachment 212659 [details]
dmesg ext4
Comment 3 Wormo (RETIRED) gentoo-dev 2010-01-01 08:11:24 UTC
Well this is a major symptom and also very painful to track down. Could easily be a race condition that is only triggered by your whole setup of encrypted root + raid1 + encrypted swap + hibernate and maybe even your particular hardware since board-specific ACPI quirks can affect hibernate results. Not to mention filesystem-eating bugs are not very popular for other people to try to reproduce...

How much trouble are you willing to go to in order to track this thing down? If a bug gets filed with the kernel devs and they start suggesting things to try, will you be able to do more experiments or would it be too disruptive to risk more filesystem corruption? (For instance, you might not have time to keep restoring your system, and would rather just skip the RAID1...)

Comment 4 Petr Lanc 2010-01-05 18:13:48 UTC
I am willing help with resolve this bug. It's on my home computer and, although its painful, I can restore root from backup. If no other option last, I always can drop raid1.
Comment 5 Daniel Black (RETIRED) gentoo-dev 2010-01-21 05:23:51 UTC
are you using a customised initramfs to setup the crypt-swap on startup? From memory the major/minor codes on the cryptswap should be the same between suspend resume. To aid debugging start by trying to capture a lot more shutdown/startup info preferably on a new as possible vanilla kernel.
Comment 6 Petr Lanc 2010-01-25 18:57:17 UTC
I'm using initrd generated by genkernel initrd with LVM, MDADM and LUKS set to "yes". I didn't any changes to initrd.

grub.conf:
kernel /boot/vmlinuz root=/dev/ram0 crypt_root=/dev/md1 real_root=/dev/mapper/root crypt_swap=/dev/sda2 swap_keydev=/dev/mapper/root swap_key=etc/keys/swap.key real_resume=/dev/mapper/swap acpi_enforce_resources=lax
initrd /boot/initramfs

Swap is encrypted by luks aes-xts-plain and decrypted by key, what lay on rootfs in /etc/keys/swap.key.

I just install sys-kernel/vanilla-sources-2.6.32.4. Is it new enough?

I will save dmesg after resumes. Should I get some other debug information that dmesg?

System always resume from hibernation without problems (without errors in dmesg), but sometimes, after few minutes (usually while write activity on rootfs, like emerge -uD world) it remount / as ro (I have error behavior set to remount-ro) and report errors on filesystem. In about half cases it eat several files (it really likes /sbin and /lib64) and system is unable to boot.
Comment 7 Petr Lanc 2010-02-09 23:03:21 UTC
Today, first rootfs corruption occurred since I run sys-kernel/vanilla-sources-2.6.32.4. System resumed OK, as usual, and I made backup of /. (I do this backup regularly.) After several hours, in dmesg ext4 errors appeared. dmesg and messages (with time stamps).
Comment 8 Petr Lanc 2010-02-09 23:04:12 UTC
Created attachment 219035 [details]
dmesg 2010-02-09
Comment 9 Petr Lanc 2010-02-09 23:05:20 UTC
Created attachment 219037 [details]
messages 2010-02-09 (cut)
Comment 10 Petr Lanc 2010-03-23 18:02:13 UTC
Created attachment 224925 [details]
dmesg 2010-03-22
Comment 11 Petr Lanc 2010-03-23 18:02:37 UTC
Created attachment 224927 [details]
messages 2010-03-22 (cut)
Comment 12 Petr Lanc 2010-03-23 18:08:07 UTC
Nothing new with 2.6.32.8. This time, it eat /etc/profile.env. dmesg is the same as usual. Error came out when i run emerge -uD world and it wrote some files in /lib32.
Comment 13 Mike Pagano gentoo-dev 2010-04-25 18:46:51 UTC
This seems to be able to be reliably reproduced. Please file this upstream at http://bugzilla.kernel.org and copy the bug URL down here.