Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 809257 - emerge --depclean should not unmerge packages that concurrently running merges need
Summary: emerge --depclean should not unmerge packages that concurrently running merge...
Status: UNCONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core - Interface (emerge) (show other bugs)
Hardware: All Linux
: Low enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-20 17:31 UTC by Michael Jones
Modified: 2022-11-15 19:57 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Jones 2021-08-20 17:31:47 UTC
If a user has two terminals open

1. emerge some-package-with-deps
2. emerge --depclean

The emerge --depclean, if run after 1. is in-progress, will unmerge packages that were installed by the currently running merge operation.

Better would be for emerge  to record somewhere that some packages are depended upon by packages that are currently scheduled for installation, and emerge --depclean should check that record and not unmerge those dependencies.

Reproducible: Always
Comment 1 cyrillic 2022-01-20 19:48:33 UTC
Kinda like driving with one foot on the gas and one foot on the brake :)
Comment 2 Brian Dolbec (RETIRED) gentoo-dev 2022-01-20 19:58:48 UTC
Emerge uses the info available to it when it is initiated.  It is NOT a multi-instance aware package manager.  So, it can not know what it WILL NEED in another instance.

I doubt that this will ever be implemented.  It is certainly not a bug.  At best it is a feature enhancement.
Comment 3 Michael Jones 2022-01-20 22:14:44 UTC
Portage knows that there is a package being installed, so portage should know that uninstalling the dependencies for that package will break things.

It's a bug, not a feature enhancement, certainly. Users expect their tools to do at least a minimum amount of effort to not "obviously" break things.
Comment 4 Zac Medico gentoo-dev 2022-01-20 23:47:37 UTC
A simple "fix" would be to disallow any concurrent operations while an emerge instance is running, however that's heavy handed because there are lots of cases where it's perfectly harmless to have more than one emerge instance run concurrently.
Comment 5 Brian Dolbec (RETIRED) gentoo-dev 2022-01-21 01:25:02 UTC
(In reply to Michael Jones from comment #3)
> Portage knows that there is a package being installed, so portage should
> know that uninstalling the dependencies for that package will break things.
> 
> It's a bug, not a feature enhancement, certainly. Users expect their tools
> to do at least a minimum amount of effort to not "obviously" break things.

Portage "knowing" about pkgs being installed for all running instances ASSUMES that portage/emerge communicates what it is doing/intending to do to all other instances.

This is NOT the case.  Each instance loads and initializes with the current state and trees.  It does not query for other potentially running instances.  So, to run concurrent instances REQUIRES the user to be aware of potential conflicts.  Certainly running depclean is very much a conflict.

Zac some years ago floated the idea of portage becoming a service always running in the background.  With clients requesting actions.   This sounds like what you are thinking is currently happening.  This is not how it works.
Comment 6 Michael Jones 2022-01-21 03:00:57 UTC
> Zac some years ago floated the idea of portage becoming a service always running in the background.  With clients requesting actions.   This sounds like what you are thinking is currently happening.  This is not how it works.

That isn't what I was thinking, nor what I was proposing. Though I don't think its a bad idea.

Portage writes plenty of information to disk as part of its normal operations, including cross-portage instance locks to synchronize install steps, and access to the package database.

Portage also records what it was going to merge, to support the --resume flag.

So have emerge --depclean check the package dataset to see if there is a package pending install with an active portage instance (check the pid of the instance that journaled that intent) and don't remove its dependencies.
Comment 7 Alec Warner (RETIRED) archtester gentoo-dev Security 2022-01-21 16:21:18 UTC
(In reply to Michael Jones from comment #6)
> > Zac some years ago floated the idea of portage becoming a service always running in the background.  With clients requesting actions.   This sounds like what you are thinking is currently happening.  This is not how it works.
> 
> That isn't what I was thinking, nor what I was proposing. Though I don't
> think its a bad idea.
> 
> Portage writes plenty of information to disk as part of its normal
> operations, including cross-portage instance locks to synchronize install
> steps, and access to the package database.
> 
> Portage also records what it was going to merge, to support the --resume
> flag.
> 
> So have emerge --depclean check the package dataset to see if there is a
> package pending install with an active portage instance (check the pid of
> the instance that journaled that intent) and don't remove its dependencies.

Currently we allow users to run concurrent emerge processes. This is not safe (for all cases) but is safe for many people (and we do have some built-in locks to prevent really obvious race conditions.) A lot of this just depends on timing and what commands are actually executed. You have located a case where its not safe.

Could we make your specific case (running emerge <pkg> and emerge --depclean simultaneously) safe? Plausibly (as you say, by writing the in-progress deptre somewhere and reading it in on --depclean.) But its in fact one of dozens of race conditions in concurrent emerge usage which is why our first answer was "yes, please don't do that." It also doesn't fix the broader problem for *portage* which is that running concurrent emerge processes is just prone to various race conditions. Fixing one race condition is mostly papering over the issue which is that we are multiple writers on various resources with minimal locking and serialization.

The fix for all of those races is to add locking and serialization ...which is why everyone always brings up 'emerge as a daemon' because a daemon is a simple way to serialize package commands.

We could also do as Zac said and add an 'emerge lock', depending on how lock waiting and release was implemented we might be able to use it to perform some rudimentary serialization. You see this in other package managers like apt and dpkg.
Comment 8 Florian Schmaus gentoo-dev 2022-11-15 19:57:29 UTC
I just ran into this. Can we, instead of disallowing *any* concurrent operations, as Zac proposed in comment #4, simply defer/disallow any concurrent operations in addition to an running "emerge --depclean". Furthermore, a started ""emerge --depclean" could wait for currently running emerges to finish, while blocking new ones until itself finished.

This would have prevented me running into the issue and is probably easier to implement compared to the currently suggested schemes in this bug, while still allowing parallel emerge jobs to run.

FWIW, some background how I ran into this: I have an automated daily task that updates my system, which also runs "emerge --depclean" at the end. I assume that many people have similar tasks. Since this task is running in the background, I am mostly unaware of its presence. If I now emerge a package at an unfavorable point in time the package may fail to build since a dependency may be remove by the parallel running --depclean (see bug #881415).