Summary: | encrypted rootfs on MD raid1 + hibernation to encrypted swap = corrupted rootfs and unbootable system | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Petr Lanc <drakoun> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED UPSTREAM | ||
Severity: | critical | CC: | bugs_gentoo_org.Tim_OKelly, drakoun |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
dmesg ext3
dmesg ext4 dmesg 2010-02-09 messages 2010-02-09 (cut) dmesg 2010-03-22 messages 2010-03-22 (cut) |
Description
Petr Lanc
2009-12-10 23:21:51 UTC
Created attachment 212658 [details]
dmesg ext3
Created attachment 212659 [details]
dmesg ext4
Well this is a major symptom and also very painful to track down. Could easily be a race condition that is only triggered by your whole setup of encrypted root + raid1 + encrypted swap + hibernate and maybe even your particular hardware since board-specific ACPI quirks can affect hibernate results. Not to mention filesystem-eating bugs are not very popular for other people to try to reproduce... How much trouble are you willing to go to in order to track this thing down? If a bug gets filed with the kernel devs and they start suggesting things to try, will you be able to do more experiments or would it be too disruptive to risk more filesystem corruption? (For instance, you might not have time to keep restoring your system, and would rather just skip the RAID1...) I am willing help with resolve this bug. It's on my home computer and, although its painful, I can restore root from backup. If no other option last, I always can drop raid1. are you using a customised initramfs to setup the crypt-swap on startup? From memory the major/minor codes on the cryptswap should be the same between suspend resume. To aid debugging start by trying to capture a lot more shutdown/startup info preferably on a new as possible vanilla kernel. I'm using initrd generated by genkernel initrd with LVM, MDADM and LUKS set to "yes". I didn't any changes to initrd. grub.conf: kernel /boot/vmlinuz root=/dev/ram0 crypt_root=/dev/md1 real_root=/dev/mapper/root crypt_swap=/dev/sda2 swap_keydev=/dev/mapper/root swap_key=etc/keys/swap.key real_resume=/dev/mapper/swap acpi_enforce_resources=lax initrd /boot/initramfs Swap is encrypted by luks aes-xts-plain and decrypted by key, what lay on rootfs in /etc/keys/swap.key. I just install sys-kernel/vanilla-sources-2.6.32.4. Is it new enough? I will save dmesg after resumes. Should I get some other debug information that dmesg? System always resume from hibernation without problems (without errors in dmesg), but sometimes, after few minutes (usually while write activity on rootfs, like emerge -uD world) it remount / as ro (I have error behavior set to remount-ro) and report errors on filesystem. In about half cases it eat several files (it really likes /sbin and /lib64) and system is unable to boot. Today, first rootfs corruption occurred since I run sys-kernel/vanilla-sources-2.6.32.4. System resumed OK, as usual, and I made backup of /. (I do this backup regularly.) After several hours, in dmesg ext4 errors appeared. dmesg and messages (with time stamps). Created attachment 219035 [details]
dmesg 2010-02-09
Created attachment 219037 [details]
messages 2010-02-09 (cut)
Created attachment 224925 [details]
dmesg 2010-03-22
Created attachment 224927 [details]
messages 2010-03-22 (cut)
Nothing new with 2.6.32.8. This time, it eat /etc/profile.env. dmesg is the same as usual. Error came out when i run emerge -uD world and it wrote some files in /lib32. This seems to be able to be reliably reproduced. Please file this upstream at http://bugzilla.kernel.org and copy the bug URL down here. |