Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 321005 - qemu-kvm (0.12.4) + virtio disk corrupts large volumes (>1TB)
Summary: qemu-kvm (0.12.4) + virtio disk corrupts large volumes (>1TB)
Status: VERIFIED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Highest critical (vote)
Assignee: Gentoo QEMU Project
URL: https://bugs.launchpad.net/ubuntu/+so...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-22 09:23 UTC by masc
Modified: 2010-10-13 17:17 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description masc 2010-05-22 09:23:20 UTC
See URL, this has been around for a while and persists in 0.12.4.
There's a patch to fix it (http://marc.info/?l=qemu-devel&m=127436114712437) would be good to see cherry picked for 0.12.4-r1

Reproducible: Always
Comment 1 Stefan Behte (RETIRED) gentoo-dev Security 2010-05-22 10:50:49 UTC
Indeed.
Comment 2 Doug Goldstein (RETIRED) gentoo-dev 2010-06-15 18:40:31 UTC
Thanks. Fixed in 0.12.4-r1
Comment 3 masc 2010-06-16 09:19:23 UTC
verified by filling ext4 formatted 1.5T drive wih rsync and fscking afterwards, no errors.
Comment 4 masc 2010-07-25 11:52:11 UTC
this bug is re-introduced in -r2 and -r3! 
I had to downgrade to -r1 to prevent (more) data corruption.

mkfs.ext4 on 1.5TB drive:

[   37.641845] Buffer I/O error on device vdg, logical block 27791617
[   37.641846] lost page write due to I/O error on vdg
[   37.641851] Buffer I/O error on device vdg, logical block 27791618
[   37.641852] lost page write due to I/O error on vdg
[   37.641854] Buffer I/O error on device vdg, logical block 27791619
[   37.641856] lost page write due to I/O error on vdg
[   37.641858] Buffer I/O error on device vdg, logical block 27791620
[   37.641859] lost page write due to I/O error on vdg
[   37.641861] Buffer I/O error on device vdg, logical block 27791621
[   37.641862] lost page write due to I/O error on vdg
[   37.641865] Buffer I/O error on device vdg, logical block 27791622
[   37.641866] lost page write due to I/O error on vdg
[   37.641868] Buffer I/O error on device vdg, logical block 27791623
[   37.641870] lost page write due to I/O error on vdg
[   37.641872] Buffer I/O error on device vdg, logical block 27791624
[   37.641873] lost page write due to I/O error on vdg
[   37.641875] Buffer I/O error on device vdg, logical block 27791625
[   37.641877] lost page write due to I/O error on vdg
[   37.641879] Buffer I/O error on device vdg, logical block 27791626
[   37.641880] lost page write due to I/O error on vdg
[   37.641949] end_request: I/O error, dev vdg, sector 222333944
[   37.642022] end_request: I/O error, dev vdg, sector 2930270088
[   37.642030] end_request: I/O error, dev vdg, sector 2930270152
Comment 5 Doug Goldstein (RETIRED) gentoo-dev 2010-07-27 23:40:21 UTC
The patch is included in the patch ball in -r2 and -r3. It's also included in 0.12.5.

Please retest.
Comment 6 masc 2010-07-28 09:13:26 UTC
behaviour does not occur with 0.12.5 (at least not with mkfs.ext4, will do more thorough tests next weekend).

seems so far, -r2 introduced a change which interferes with the virtio fix. maybe it's worth investigating so that bad surprises can be prevented in the future.

with this bug, it's already sufficient to just boot up vm without explicitly writing to a drive for data loss/corruption to occur.
Comment 7 masc 2010-07-28 22:30:52 UTC
0.12.5 is perfectly fine, verified as in #3
Comment 8 Lionel Bouton 2010-10-12 20:26:11 UTC
It seems there's a corner case left. I had these problems on 4 physical hosts (and commented on the corresponding sf.net bug as "gyver"). I migrated 3 of the 4 hosts to O.12.5-r1 which fixed the problems and allowed us to use virtio instead of emulated PIIX. I just tried to migrate the 4th one and it failed to solve the read errors in virtio block mode.

I have 3 VMs on this 4th host, 2 are x86, 1 is x86_64. All of them fail to boot with 0.12.5-r1 reporting read errors on /dev/vda. Reconfiguring them to use IDE works (but there are errors reported during the boot and the guest kernels switches to PIO after resetting the ide0 interface).
Booting all these VMs works with 0.11.1-r1.

Two details that might help :
1/
I use DRBD devices for all my virtual disks (on all 4 physical hosts),

2/
The "failing" host has different hardware, the underlying storage is based on an hardware RAID controller: a 3ware 8006-2LP with two SATA disks in RAID-1 mode (all other hosts have plain AHCI SATA controllers and use software raid). Currently the controller is rebuilding the array after we switched a failing disk with a brand new one (given there was downtime for maintenance I used the opportunity for upgrading qemu-kvm). Although there's no read error on the physical host as far as its kernel is concerned, read performance is suffering : 5MB/s top with a dd if=/dev/vda ...
Comment 9 masc 2010-10-12 20:52:20 UTC
(In reply to comment #8)
> Reconfiguring them to use IDE
> works (but there are errors reported during the boot and the guest kernels
> switches to PIO after resetting the ide0 interface).

if the behaviour also occurs with ide this might be a different problem.
I'd suggest to take this to qemu-kvm's (new) bugtracker https://bugs.launchpad.net/qemu or discuss in http://forums.gentoo.org/ first.
Comment 10 Lionel Bouton 2010-10-13 17:17:41 UTC
(In reply to comment #9)
> I'd suggest to take this to qemu-kvm's (new) bugtracker
> https://bugs.launchpad.net/qemu or discuss in http://forums.gentoo.org/ first.

Done so:
https://bugs.launchpad.net/qemu/+bug/660060