555736 – app-emulation/libvirt-1.2.14-r2 - suspending and then editing the disk image externally to libvirt can cause data corruption

Bug 555736 - app-emulation/libvirt-1.2.14-r2 - suspending and then editing the disk image externally to libvirt can cause data corruption

Summary: app-emulation/libvirt-1.2.14-r2 - suspending and then editing the disk image ...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Matthias Maier

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-07-24 01:32 UTC by Radek Pilar
Modified:	2015-09-09 13:31 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Proposed patch that makes "shutdown" the new default behaviour on libvirt stop (libvirtd.confd-r5.diff,569 bytes, patch) 2015-07-24 18:00 UTC, Radek Pilar	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Radek Pilar 2015-07-24 01:32:41 UTC

If user stops the libvirtd, or shuts down the machine altogether, he/she can easily assume machine is in "sane" state - everything unmounted, all processes stopped. Then, he mistakenly assumes he can mount (or even do something like resize) the guest filesystem. However, after restarting the libvirt hosts resumes and data corruption *will* occur. This is, by my standards, dangerous behaviour and definetely should not be default.

Right now, I can think of two possible solutions:
1) have something sane - "shutdown"? as a default for LIBVIRTD_KVM_SHUTDOWN
2) check mtime of the HDD images, if it changed perform cold boot (however, this won't work for the block devices - on that case, I'd cold boot just to be sure).

Suggestions welcome.

Comment 1 Radek Pilar 2015-07-24 18:00:53 UTC

Created attachment 407546 [details, diff]
Proposed patch that makes "shutdown" the new default behaviour on libvirt stop

Comment 2 Matthias Maier gentoo-dev

2015-07-25 20:05:05 UTC

(In reply to Radek Pilar from comment #0)
> If user stops the libvirtd, or shuts down the machine altogether, he/she can
> easily assume machine is in "sane" state - everything unmounted, all
> processes stopped.

Well, no - He cannot. In fact, the default behavior by upstream is to just
stop the libvirtd process without any further action. It is a speciality of
Gentoo's openrc runscript to provide further options for qemu/kvm guests. [1]

Why do you stop the libvirtd process in order to shut down the guests in the
first place?

> Then, he mistakenly assumes he can mount (or even do
> something like resize) the guest filesystem. However, after restarting the
> libvirt hosts resumes and data corruption *will* occur. This is, by my
> standards, dangerous behaviour and definetely should not be default.

The decision to have "managedsave" as default behavior was made quite a while
ago.

I will discuss with cardoe whether a change to

    LIBVIRTD_KVM_SHUTDOWN="shutdown"
    LIBVIRTD_KVM_RESTART="none"

as configuration default should be made.


[1] Please note that for all other virtualization approaches no action is taken
- neither by Gentoo's runscript, nor by upstreams auxiliary scripts.

Comment 3 Sebastian Pipping gentoo-dev

2015-07-25 20:30:11 UTC

I heard about this ticketd on IRC.
I support protecting from unexpected data loss.

I would like to point out that current users may rely on the suspend-resume approach to have their KVM guests auto-started with minimal downtime (saving the guest reboots).  So if that default changes, personally I would expect a portage news entry with the changes explained and instructions on how to restore previous behaviour.

Comment 4 Radek Pilar 2015-07-25 21:13:23 UTC

(In reply to Matthias Maier from comment #2)
> Why do you stop the libvirtd process in order to shut down the guests in the
first place?

Actually, I did not. I've shutdown the entire machine, booted from liveCD and reorganized/resized some LVM volumes that were mounted inside VMs. (The liveCD boot was because it required modifications to the root mountpoint as well).
After reboot massive data corruption in each LV I've used from liveCD obviously occured.

If I had been shutting down VMs one by one, I'd of course check if the VM is not running, but since I've shut down the whole system, I've expected everything to be, well, shut down.

> > If user stops the libvirtd, or shuts down the machine altogether, he/she can
> > easily assume machine is in "sane" state - everything unmounted, all
> > processes stopped.
> 
> Well, no - He cannot. In fact, the default behavior by upstream is to just
> stop the libvirtd process without any further action. It is a speciality of
> Gentoo's openrc runscript to provide further options for qemu/kvm guests. [1]

Okay, in that case, replace "stops the libvirtd, or shuts down the machine altogether" with "shuts down the machine". In that case, the qemu process will no longer run and VM will cold-boot on the machine boot.

Comment 5 Matthias Maier gentoo-dev

2015-07-25 21:46:25 UTC

> Actually, I did not. I've shutdown the entire machine, booted from liveCD
> and reorganized/resized some LVM volumes that were mounted inside VMs. (The
> liveCD boot was because it required modifications to the root mountpoint as
> well).

As I said, I will discuss to change the default to
    LIBVIRTD_KVM_SHUTDOWN="shutdown"
    LIBVIRTD_KVM_RESTART="none"

Comment 6 Matthias Maier gentoo-dev

2015-07-28 16:57:57 UTC

I have changed the default behavior in 1.2.17-r2.


(In reply to Sebastian Pipping from comment #3)
> So if that default changes, personally I would expect a portage news entry
> with the changes explained and instructions on how to restore previous behaviour.

I will prepare a news entry for the stabilization.


*libvirt-1.2.17-r2 (28 Jul 2015)

  28 Jul 2015; Matthias Maier <tamiko@gentoo.org>
  +files/libvirt-1.2.17-fix_paths_for_apparmor.patch, +files/libvirtd.confd-r6,
  +files/libvirtd.init-r16, +libvirt-1.2.17-r2.ebuild, -files/libvirtd.confd-r5,
  -files/libvirtd.init-r15, -libvirt-1.2.17-r1.ebuild, libvirt-9999.ebuild:
  Change default behavior for kvm guest in openrc runscript, bug #555736; fix
  apparmor configuration, bug #554628; ebuild maintenance

Comment 7 Doug Goldstein (RETIRED) gentoo-dev

2015-07-30 19:30:58 UTC

So this is a bad change all around for a number of reasons. There are several things that have gone into this over the years to make this a better setup. Some of the reasons are legacy reasons due to the behavior of OpenRC. Firstly I'll explain the external commands into the script:

start
-----
starts up libvirtd and any associated guests that are marked for autostart

stop
----
stops libvirtd and any associated VMs (and networks). The default is to perform a managedsave. Which means your VMs are stopped EXTERNAL to their OS. Their state is preserved and no data is lost. When they come back up its as if nothing had changed.

restart
-------
stops libvirtd and starts it along. Along with this it stops the VMs in a managedsave by default. This is what we tell users to do for security issues since it stops QEMU and starts a new instance.

reload
------
stops libvirtd and starts it back up without touching your VMs. Back in the day libvirtd didn't always do this gracefully or be able to control your VMs or know their states fully. I'm not sure how good they are about this still. It depends on how much data can be probed from QEMU via QMP. Historically there has been data that libvirtd just can't get over QMP over the state of the VM after it starts. Which means you need to really restart in some cases.

Why is it called restart and reload?
------------------------------------
Back in the day OpenRC (honestly I think its before it was renamed to OpenRC) had fixed actions which could be executed. The only ones were 'restart' and 'reload' that made any sense. 'restart' was defined as "intrusive restart of services" while 'reload' was defined as "unintrusive restart of services". By those definitions, 'restart' should be the action that restarts libvirtd and all your VMs while 'reload' only restarted libvirtd.

Why is 'shutdown' way worse default than 'managedsave'?
-----------------------------------------------
Consider the scenario that you are running a guest without ACPI support or have it disabled. Calling /etc/init.d/libvirtd stop will now hang on that VM for 500 seconds by default and then call kill -9 $(pidof qemu-for-that-vm). Your disk will not unmount cleanly, you will likely have data loss or corruption. What about systems where an ACPI shutdown pops up a dialog window and does not start the shutdown process (read: any version of Windows). You've now unmounted and/or corrupted that Windows installation.

With 'managedsave' the CPU is stopped external to the OS. The contents of RAM are written to disk and when the machine is started back up it resumes as if nothing had happened.

The original reporter claims that the VMs are being suspended but they are not suspended. A suspend operation is internal to the guest OS. The original reporter mentions that he stops libvirtd and then assumes he can mount and mess with the disk of the VM. That's a terrible assumption. The behavior would be to properly shutdown that guest with libvirt (virsh stop DOMAIN) before doing that work. Many external utilities that work with the guest disk do those operations (and actually rely on libvirtd to be running to verify and read different information about the disks). The issue here is the original reporter is the edge case and not the normal case.

Lastly 'managedsave' is the default in most distros and now we are the special case / different so people switching to Gentoo will get bitten by another difference.

Where do we go from here?
-------------------------
Set the default back to 'managedsave' and update documentation and make it more clear for people in the edge case. Maybe a wiki page.

Potential improvements
----------------------
I previously did the work of splitting up the init scripts to libvirtd and libvirt-guests just like the systemd setup. Not sure why I never committed it (its been well over a year since I did the work). We can bring that into the tree. I however would keep the default as 'managedsave' so that wouldn't address the original report's complaints.

Comment 8 Radek Pilar 2015-07-30 20:37:20 UTC

(In reply to Doug Goldstein from comment #7)
> Why is 'shutdown' way worse default than 'managedsave'?
> -----------------------------------------------
> Consider the scenario that you are running a guest without ACPI support or
> have it disabled. Calling /etc/init.d/libvirtd stop will now hang on that VM
> for 500 seconds by default and then call kill -9 $(pidof qemu-for-that-vm).
> Your disk will not unmount cleanly, you will likely have data loss or
> corruption. What about systems where an ACPI shutdown pops up a dialog
> window and does not start the shutdown process (read: any version of
> Windows). You've now unmounted and/or corrupted that Windows installation.
Of course you should have something that can react on ACPI shutdown event inside a VM, or shut it down manually. Oh, and as far as I know, Windows shuts down on press of power button so it is not an issue here either.

You mentioned "likely have data loss or corruption" - since we live in 21st century and have journalling filesystems, this rarely happens, and if it does, usually just a single files or directories are corrupted/lost.
If you mount/resize/fsck filesystem that is mounted as well in the (now not running) virtual machine, you are *guaranteed* to have a *massive* data loss/corruption across the whole filesystem!

 
> With 'managedsave' the CPU is stopped external to the OS. The contents of
> RAM are written to disk and when the machine is started back up it resumes
> as if nothing had happened.
Except when something happened. Like resize. Or fsck. Or just plain innocent readwrite mount.

 
> The original reporter claims that the VMs are being suspended but they are
> not suspended. A suspend operation is internal to the guest OS.
I've called it this way because I don't think there is a better word to the describe the state. VM CPU clock (=and code excution) is suspended, and RAM contents stored on HDD. I think we all know how it works, especially if it is described in the same config file.

> The original
> reporter mentions that he stops libvirtd and then assumes he can mount and
> mess with the disk of the VM. That's a terrible assumption. 
I've shut down the whole machine. In fact, forget that I ever mentioned stopping libvirtd service has the same effect. I've shut down the whole machine and made an assumption that, since machine is effectively powered down and there is no process running, I can work with disks/partitions on the machine as I wish without breaking anything and after cold boot everything will run from the clean state. I don't think that's a bad assumption. Do you?
 
> Lastly 'managedsave' is the default in most distros and now we are the
> special case / different so people switching to Gentoo will get bitten by
> another difference.
Or it just means it should be fixed in other distros too.

> Set the default back to 'managedsave' and update documentation and make it
> more clear for people in the edge case. Maybe a wiki page.
If the default is *unsafe* managedsave, then there probably should be big fat warning somewhere when installing the package. Or some check that will

Comment 9 Doug Goldstein (RETIRED) gentoo-dev

2015-07-30 21:39:27 UTC

(In reply to Radek Pilar from comment #8)
> (In reply to Doug Goldstein from comment #7)
> > Why is 'shutdown' way worse default than 'managedsave'?
> > -----------------------------------------------
> > Consider the scenario that you are running a guest without ACPI support or
> > have it disabled. Calling /etc/init.d/libvirtd stop will now hang on that VM
> > for 500 seconds by default and then call kill -9 $(pidof qemu-for-that-vm).
> > Your disk will not unmount cleanly, you will likely have data loss or
> > corruption. What about systems where an ACPI shutdown pops up a dialog
> > window and does not start the shutdown process (read: any version of
> > Windows). You've now unmounted and/or corrupted that Windows installation.
> Of course you should have something that can react on ACPI shutdown event
> inside a VM, or shut it down manually. Oh, and as far as I know, Windows
> shuts down on press of power button so it is not an issue here either.
> 
> You mentioned "likely have data loss or corruption" - since we live in 21st
> century and have journalling filesystems, this rarely happens, and if it
> does, usually just a single files or directories are corrupted/lost.
> If you mount/resize/fsck filesystem that is mounted as well in the (now not
> running) virtual machine, you are *guaranteed* to have a *massive* data
> loss/corruption across the whole filesystem!
> 

Except if you use write back then you will in fact have data loss. So it is likely to happen in that case. You are making a fatal assumption that the VM is not running when it is in managedsave mode. It is in fact "running", when it comes back up it will not have even noticed the down time. If you want to mess with a VM's disk external to the VM something that is definitely not a normal use case then you should properly shut down the machine with "virsh stop DOMAIN". You should likely also use libguestfs which will enforce this for you.


>  
> > With 'managedsave' the CPU is stopped external to the OS. The contents of
> > RAM are written to disk and when the machine is started back up it resumes
> > as if nothing had happened.
> Except when something happened. Like resize. Or fsck. Or just plain innocent
> readwrite mount.

Again. Not a normal use case for someone to externally to the VM to mess with the disk. Additionally you are still making the assumption the machine is off when it is in fact not.


> 
>  
> > The original reporter claims that the VMs are being suspended but they are
> > not suspended. A suspend operation is internal to the guest OS.
> I've called it this way because I don't think there is a better word to the
> describe the state. VM CPU clock (=and code excution) is suspended, and RAM
> contents stored on HDD. I think we all know how it works, especially if it
> is described in the same config file.
> 
> > The original
> > reporter mentions that he stops libvirtd and then assumes he can mount and
> > mess with the disk of the VM. That's a terrible assumption. 
> I've shut down the whole machine. In fact, forget that I ever mentioned
> stopping libvirtd service has the same effect. I've shut down the whole
> machine and made an assumption that, since machine is effectively powered
> down and there is no process running, I can work with disks/partitions on
> the machine as I wish without breaking anything and after cold boot
> everything will run from the clean state. I don't think that's a bad
> assumption. Do you?

It is a bad assumption. You aren't suppose to mess with the contents of these VMs under the hood.


>  
> > Lastly 'managedsave' is the default in most distros and now we are the
> > special case / different so people switching to Gentoo will get bitten by
> > another difference.
> Or it just means it should be fixed in other distros too.
> 
> > Set the default back to 'managedsave' and update documentation and make it
> > more clear for people in the edge case. Maybe a wiki page.
> If the default is *unsafe* managedsave, then there probably should be big
> fat warning somewhere when installing the package. Or some check that will

We've had things this way for literally years and you are the first person to complain. Which again proves you are the edge case and not the normal case.

Comment 10 Radek Pilar 2015-07-30 22:36:13 UTC

(In reply to Doug Goldstein from comment #9)
> Except if you use write back then you will in fact have data loss. So it is
> likely to happen in that case. You are making a fatal assumption that the VM
> is not running when it is in managedsave mode. It is in fact "running", when
> it comes back up it will not have even noticed the down time. If you want to
> mess with a VM's disk external to the VM something that is definitely not a
> normal use case then you should properly shut down the machine with "virsh
> stop DOMAIN". You should likely also use libguestfs which will enforce this
> for you.
Of course I assume the VM is not running. There is no qemu process, the guest itself might not be running and the disk might be in another machine! The only way to find out if the VMs are suspended or actually shut down will be to look at the /var/lib/libvirt/qemu/save/ (probably) directory. Of course I (or actually, noone I know) had thought that shutting the machine down will suspend the VMs instead of shut them down.

> Again. Not a normal use case for someone to externally to the VM to mess
> with the disk. Additionally you are still making the assumption the machine
> is off when it is in fact not.
See paragraph above.

> It is a bad assumption. You aren't suppose to mess with the contents of
> these VMs under the hood.
Under the hood? Excuse me? It is a standard ext4 filesystem, on standard LVM logical volume on standard underlying device. Using resize2fs in a standard way seems appropriate.

> We've had things this way for literally years and you are the first person
> to complain. Which again proves you are the edge case and not the normal
> case.
So what? "Eats your data when used in a way we have not thought about" is fine?

Or let me put it this way... Is there any other software that works in a same way? 

Or, image that you have - for example - database server. You shut it down / shut down the computer running the DB server. Then you copy data files to another machine, run the DB server, make some changes, stop that machine. Then copy data files back to the original machine, boot it and get data corrupted because in /var/lib/some/obscure/path/ there was a file affecting the DB server runtime. Would it be okay?

Comment 11 Doug Goldstein (RETIRED) gentoo-dev

2015-07-31 03:05:28 UTC

(In reply to Radek Pilar from comment #10)
> (In reply to Doug Goldstein from comment #9)
> > Except if you use write back then you will in fact have data loss. So it is
> > likely to happen in that case. You are making a fatal assumption that the VM
> > is not running when it is in managedsave mode. It is in fact "running", when
> > it comes back up it will not have even noticed the down time. If you want to
> > mess with a VM's disk external to the VM something that is definitely not a
> > normal use case then you should properly shut down the machine with "virsh
> > stop DOMAIN". You should likely also use libguestfs which will enforce this
> > for you.
> Of course I assume the VM is not running. There is no qemu process, the
> guest itself might not be running and the disk might be in another machine!
> The only way to find out if the VMs are suspended or actually shut down will
> be to look at the /var/lib/libvirt/qemu/save/ (probably) directory. Of
> course I (or actually, noone I know) had thought that shutting the machine
> down will suspend the VMs instead of shut them down.
> 
> > Again. Not a normal use case for someone to externally to the VM to mess
> > with the disk. Additionally you are still making the assumption the machine
> > is off when it is in fact not.
> See paragraph above.
> 
> > It is a bad assumption. You aren't suppose to mess with the contents of
> > these VMs under the hood.
> Under the hood? Excuse me? It is a standard ext4 filesystem, on standard LVM
> logical volume on standard underlying device. Using resize2fs in a standard
> way seems appropriate.
> 
> > We've had things this way for literally years and you are the first person
> > to complain. Which again proves you are the edge case and not the normal
> > case.
> So what? "Eats your data when used in a way we have not thought about" is
> fine?
> 
> Or let me put it this way... Is there any other software that works in a
> same way? 
> 
> Or, image that you have - for example - database server. You shut it down /
> shut down the computer running the DB server. Then you copy data files to
> another machine, run the DB server, make some changes, stop that machine.
> Then copy data files back to the original machine, boot it and get data
> corrupted because in /var/lib/some/obscure/path/ there was a file affecting
> the DB server runtime. Would it be okay?

Honestly we can do this back and forth for a while. At the end of the day we need sane defaults for a majority of users and the majority of use cases. We have a documented ability to change the defaults. The point is there were issues with the default of "shutdown" that I have done my best to explain. Issues that are far more common than your use case of mounting and messing with disk images external to the VM. It has been the default for over 5 years. Here's the commit that introduced "managedsave" as the default.

https://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/app-emulation/libvirt/files/libvirtd.confd-r1?hideattic=0&revision=1.1&view=markup

@tamiko: Let's switch this back please. Radek now knows how to configure his setup for how he wants it to behave. While I agree Gentoo is about choice (and we are preserving user choice by having it configurable), its also about sane defaults.

I just tested this setting with a Windows 2008 Server VM. I then ran /etc/init.d/libvirtd stop and when I went to boot the VM back up I had to run chkdsk. The reason for this was that Windows popped up a dialog box asking me to say why I was shutting it down.

Comment 12 Matthias Maier gentoo-dev

2015-07-31 03:15:38 UTC

(In reply to Doug Goldstein from comment #11)

@tamiko: Let's switch this back please.

Agreed. I will revbump with the old default values.
After that we can work on a migration to an extra libvirt-guests service (which will simultaneously resolve #551854.

Comment 13 Matthias Maier gentoo-dev

2015-07-31 03:24:59 UTC

*libvirt-1.2.17-r4 (31 Jul 2015)

  31 Jul 2015; Matthias Maier <tamiko@gentoo.org> +libvirt-1.2.17-r4.ebuild,
  -files/libvirtd.confd-r6, -files/libvirtd.init-r16, -libvirt-1.2.17-r3.ebuild,
  libvirt-9999.ebuild:
  on behalf of cardoe; revert all changes made to the openrc runscripts; as per
  discussions on bug #555736 and on bug #551854

Comment 14 Matthias Maier gentoo-dev

2015-07-31 03:33:33 UTC

To clarify:

Cardoe and I have decided to split out the guest managed from the libvirtd runscipt into its own "libvirt-guests" runsript (as is already the case for the systemd unit files):

/etc/init.d/libvirtd
  - solely manages the libvirtd daemon

/etc/init.d/libvirt-guests
  - solely takes care of running guests upon restart/shutdown of the host


With that I believe that such an unfortunate situation as you ran into can be avoided: A user now has to *consciously* activate the libvirt-guests runscript to get any guest handling at all. The default of this will be "managedsave".


I'm sorry for the mess - my changes were a bit premature.

Things to do:

 [ ] Refactor init scripts

 [ ] Update GENTOO.readme portion to explain the new behavior

 [ ] write a portage news item.

Comment 15 Matthias Maier gentoo-dev

2015-08-20 01:16:11 UTC

commit 75a2af9e00d8701cb3a6f201976ed9ba27bcc7d0
Author: Matthias Maier <tamiko@gentoo.org>
Date:   Wed Aug 19 20:07:43 2015 -0500

    app-emulation/libvirt: Update init scripts for 1.2.18-r2 and 9999
    
    update init scripts for 1.2.18-r2 und 9999. Drop keywords for testing due
    to major runscript change.
    
    Bugs: 551854
    Bugs: 555736
    Bugs: 558034
    
    Package-Manager: portage-2.2.20.1

commit aed11adf8c5bdff71154e1c5e0c375951d237f36
Author: Doug Goldstein <cardoe@gentoo.org>
Date:   Tue Aug 18 16:08:56 2015 -0500

    app-emulation/libvirt: break apart OpenRC init script
    
    This breaks apart the startup/shutdown of libvirtd from the
    startup/shutdown of the libvirt domains. Additionally this expands out
    the domain support to work for all the libvirtd backends instead of just
    QEMU.
    
    Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
    Signed-off-by: Matthias Maier <tamiko@gentoo.org>

Comment 16 Doug Goldstein (RETIRED) gentoo-dev

2015-09-09 13:31:49 UTC

Everything that will be fixed for this ticket is fixed.