| Summary: | kernel 3.4.3 + e2fsprogs 1.42 + hdparm-9.39 : Raid-1 : complete data loss | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Manfred Knick <Manfred.Knick> |
| Component: | [OLD] Core system | Assignee: | Gentoo Linux bug wranglers <bug-wranglers> |
| Status: | RESOLVED NEEDINFO | ||
| Severity: | normal | ||
| Priority: | Normal | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| URL: | https://bugzilla.kernel.org/show_bug.cgi?id=43791 | ||
| See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=43791 | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
|
Description
Manfred Knick
2012-06-25 19:11:04 UTC
-------------
Long version:
-------------
Hardware involved:
AMD Phenom(tm) 9950 Quad-Core
8 GiB RAM
ASUS M2N-SLI Deluxe
Source: HW-Raid-10:
# lspci -s 02:00.0 -v
02:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR-2405
Destination: SW-Raid-1:
hdparm -i /dev/sdb
. Model=ST31500341AS, FwRev=CC1H, ...
hdparm -i /dev/sdc
. Model=ST31500341AS, FwRev=CC1H, ...
These two are mounted upon an Adaptec 1220SA:
# lspci -s 03:00.0 -v
03:00.0 RAID bus controller: Silicon Image, Inc. Device 0242 (rev 01)
Subsystem: Adaptec Device 0242
Kernel:
Running on 3.2.16, having noticed that
. - the problem with the radix-tree iterators was being fixed in 3.4.2 and
. - that Neil Brown's RAID fix had arrived in [3.4, 3.3.4, or 3.2.17],
I upgraded the kernel to 3.4.3 first.
To be cautious, I deleted the old Raid-1:
. ddrescue -f /dev/zero /dev/sdb -b 4096
. ddrescue -f /dev/zero /dev/sdc -b 4096
. < after Reboot: no md any more >
confirmed TLER settings:
. smartctl -l scterc,70,70 /dev/sdb
. smartctl -l scterc,70,70 /dev/sdc
and built it a-new:
mdadm --create --verbose --metadata=1.2 /dev/md/ST-21 --level=mirror --raid-devices=2 /dev/sdb /dev/sdc
$ equery belongs mdadm
. sys-fs/mdadm-3.1.5 (/sbin/mdadm)
By purpose, I gave md some hours to complete syncing from /dev/sdb to /dev/sdc,
before even starting partitioning:
. parted -a optimal /dev/md/ST-21
. mklabel msdos
. mkpart primary ext2 4096 -1
and creating the filesystem:
. mkfs.ext4 -L ST-21-P1 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md/ST-21p1
( -E : in order of being sure that no pending operations were left open)
$ equery belongs mkfs.ext4
. sys-fs/e2fsprogs-1.42 (/sbin/mkfs.ext4)
Notabene:
. "E2fsprogs 1.42 (November 29, 2011)
. This release of e2fsprogs has support for file systems > 16TB."
and:
. "E2fsprogs 1.42.4 (June 12, 2012)
. Fixed more 64-bit block number bugs (which could end up corrupting file systems!) in e2fsck, debugfs, and libext2fs."
/etc/fstab:
. LABEL=ST-21-P1 /Mammut/ST-21-P1 ext4 defaults,noatime 1 2
fdisk -l :
...
/dev/md127p1 1,4T 21G 1,3T 2% /Mammut/ST-21-P1
...
Because this was data I did not need permanent access to,
the Seagate drives were configured to spin down after 10' without access:
equery list hdparm:
[IP-] [ ] sys-apps/hdparm-9.39:0
/etc/config/hdparm:
...
sdb_args="-S120"
sdc_args="-S120"
...
Now I copied the respective directory tree T:
. cp -a /<Raid-10-mountpoint>/T /Mammut/ST-21-P1/
and checked the result with
. diff -R /<Raid-10-mountpoint>/T /Mammut/ST-21-P1/T
as successful.
I'm sorry that I have to become a little bit unprecise now:
As far as I remember,
there was a reboot first, the copy still readable,
then an automatic spin-down.
After another reboot at some stage,
the copy was not visible while in stand-by mode;
bringing up the two disks, it was visible again.
Anyway:
Completely Power Off during night -
Power On next morning:
the copied T was _gone_ !!!,
but an (empty) "lost+found" ???
What I get now is the following:
# mdadm -Evvvvs
mdadm: No md superblock detected on /dev/md/mammut:ST-21p1.
mdadm: No md superblock detected on /dev/md/mammut:ST-21.
...
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 16bd66f7:96a400f6:eb91f3c0:f5e58122
Name : mammut:ST-21 (local to host mammut)
Creation Time : Wed Jun 20 19:51:50 2012
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
Array Size : 2930274848 (1397.26 GiB 1500.30 GB)
Used Dev Size : 2930274848 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a4cef825:a19980d2:285560d9:0c6da2af
Update Time : Fri Jun 22 07:30:30 2012
Checksum : e9b7551c - correct
Events : 19
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
...
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 16bd66f7:96a400f6:eb91f3c0:f5e58122
Name : mammut:ST-21 (local to host mammut)
Creation Time : Wed Jun 20 19:51:50 2012
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
Array Size : 2930274848 (1397.26 GiB 1500.30 GB)
Used Dev Size : 2930274848 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : baa75e0c:424e949e:b15d863d:a5e31ef8
Update Time : Fri Jun 22 07:30:30 2012
Checksum : 48d361f4 - correct
Events : 19
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing)
...
# ls -algR /Mammut/ST-21-P1/
/Mammut/ST-21-P1/:
insgesamt 24
drwxr-xr-x 3 root 4096 21. Jun 19:06 .
drwxr-xr-x 5 root 4096 21. Mai 21:37 ..
drwx------ 2 root 16384 20. Jun 19:58 lost+found
/Mammut/ST-21-P1/lost+found:
insgesamt 20
drwx------ 2 root 16384 20. Jun 19:58 .
drwxr-xr-x 3 root 4096 21. Jun 19:06 ..
!----------------------------------!
! No /Mammut/ST-21-P1/T any more !
!----------------------------------!
Might be of interest / perhaps related: https://bugs.gentoo.org/show_bug.cgi?id=416353 Information forwarded to https://bugzilla.kernel.org/show_bug.cgi?id=43791 and linux-raid@vger.kernel.org Please reopen this bug report when you have found a bug to report. |