-------------- Short version: -------------- Needing more space on my (quicker) HW-Raid-10, I wanted to transfer ~850 GiB to a (slower) SW-Raid1. Creation succeded, Transfer succeeded, Compare (diff) succeded, even Reboot succeded; but after Power Off / Boot: all data lost, filesystem only offering an empty "lost+found". ----- cite ----- [PM] ... there is no evidence of a problem with RAID. The filesystem has lost its contents. So it *looks* like an error with "rm" or "mkfs". It possibly isn't that simple but it doesn't look at all like a RAID problem. NeilBrown ----- /cite ----- [@Neil: Thank you for looking at it and for your comment] I am just continung on additional experiments, but unfortunately they take soo loong ... I am prepared to provide additional information as requested; proposals how to debug this strange coincidence are welcome. Reproducible: Couldn't Reproduce
------------- Long version: ------------- Hardware involved: AMD Phenom(tm) 9950 Quad-Core 8 GiB RAM ASUS M2N-SLI Deluxe Source: HW-Raid-10: # lspci -s 02:00.0 -v 02:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09) Subsystem: Adaptec ASR-2405 Destination: SW-Raid-1: hdparm -i /dev/sdb . Model=ST31500341AS, FwRev=CC1H, ... hdparm -i /dev/sdc . Model=ST31500341AS, FwRev=CC1H, ... These two are mounted upon an Adaptec 1220SA: # lspci -s 03:00.0 -v 03:00.0 RAID bus controller: Silicon Image, Inc. Device 0242 (rev 01) Subsystem: Adaptec Device 0242 Kernel: Running on 3.2.16, having noticed that . - the problem with the radix-tree iterators was being fixed in 3.4.2 and . - that Neil Brown's RAID fix had arrived in [3.4, 3.3.4, or 3.2.17], I upgraded the kernel to 3.4.3 first. To be cautious, I deleted the old Raid-1: . ddrescue -f /dev/zero /dev/sdb -b 4096 . ddrescue -f /dev/zero /dev/sdc -b 4096 . < after Reboot: no md any more > confirmed TLER settings: . smartctl -l scterc,70,70 /dev/sdb . smartctl -l scterc,70,70 /dev/sdc and built it a-new: mdadm --create --verbose --metadata=1.2 /dev/md/ST-21 --level=mirror --raid-devices=2 /dev/sdb /dev/sdc $ equery belongs mdadm . sys-fs/mdadm-3.1.5 (/sbin/mdadm) By purpose, I gave md some hours to complete syncing from /dev/sdb to /dev/sdc, before even starting partitioning: . parted -a optimal /dev/md/ST-21 . mklabel msdos . mkpart primary ext2 4096 -1 and creating the filesystem: . mkfs.ext4 -L ST-21-P1 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md/ST-21p1 ( -E : in order of being sure that no pending operations were left open) $ equery belongs mkfs.ext4 . sys-fs/e2fsprogs-1.42 (/sbin/mkfs.ext4) Notabene: . "E2fsprogs 1.42 (November 29, 2011) . This release of e2fsprogs has support for file systems > 16TB." and: . "E2fsprogs 1.42.4 (June 12, 2012) . Fixed more 64-bit block number bugs (which could end up corrupting file systems!) in e2fsck, debugfs, and libext2fs." /etc/fstab: . LABEL=ST-21-P1 /Mammut/ST-21-P1 ext4 defaults,noatime 1 2 fdisk -l : ... /dev/md127p1 1,4T 21G 1,3T 2% /Mammut/ST-21-P1 ... Because this was data I did not need permanent access to, the Seagate drives were configured to spin down after 10' without access: equery list hdparm: [IP-] [ ] sys-apps/hdparm-9.39:0 /etc/config/hdparm: ... sdb_args="-S120" sdc_args="-S120" ... Now I copied the respective directory tree T: . cp -a /<Raid-10-mountpoint>/T /Mammut/ST-21-P1/ and checked the result with . diff -R /<Raid-10-mountpoint>/T /Mammut/ST-21-P1/T as successful. I'm sorry that I have to become a little bit unprecise now: As far as I remember, there was a reboot first, the copy still readable, then an automatic spin-down. After another reboot at some stage, the copy was not visible while in stand-by mode; bringing up the two disks, it was visible again. Anyway: Completely Power Off during night - Power On next morning: the copied T was _gone_ !!!, but an (empty) "lost+found" ??? What I get now is the following: # mdadm -Evvvvs mdadm: No md superblock detected on /dev/md/mammut:ST-21p1. mdadm: No md superblock detected on /dev/md/mammut:ST-21. ... /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 16bd66f7:96a400f6:eb91f3c0:f5e58122 Name : mammut:ST-21 (local to host mammut) Creation Time : Wed Jun 20 19:51:50 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB) Array Size : 2930274848 (1397.26 GiB 1500.30 GB) Used Dev Size : 2930274848 (1397.26 GiB 1500.30 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : a4cef825:a19980d2:285560d9:0c6da2af Update Time : Fri Jun 22 07:30:30 2012 Checksum : e9b7551c - correct Events : 19 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) ... /dev/sdb: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 16bd66f7:96a400f6:eb91f3c0:f5e58122 Name : mammut:ST-21 (local to host mammut) Creation Time : Wed Jun 20 19:51:50 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB) Array Size : 2930274848 (1397.26 GiB 1500.30 GB) Used Dev Size : 2930274848 (1397.26 GiB 1500.30 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : baa75e0c:424e949e:b15d863d:a5e31ef8 Update Time : Fri Jun 22 07:30:30 2012 Checksum : 48d361f4 - correct Events : 19 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) ... # ls -algR /Mammut/ST-21-P1/ /Mammut/ST-21-P1/: insgesamt 24 drwxr-xr-x 3 root 4096 21. Jun 19:06 . drwxr-xr-x 5 root 4096 21. Mai 21:37 .. drwx------ 2 root 16384 20. Jun 19:58 lost+found /Mammut/ST-21-P1/lost+found: insgesamt 20 drwx------ 2 root 16384 20. Jun 19:58 . drwxr-xr-x 3 root 4096 21. Jun 19:06 .. !----------------------------------! ! No /Mammut/ST-21-P1/T any more ! !----------------------------------!
Might be of interest / perhaps related: https://bugs.gentoo.org/show_bug.cgi?id=416353
Information forwarded to https://bugzilla.kernel.org/show_bug.cgi?id=43791 and linux-raid@vger.kernel.org
Please reopen this bug report when you have found a bug to report.