Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 480428 - Please make COLLISION_IGNORE apply only to unowned files
Summary: Please make COLLISION_IGNORE apply only to unowned files
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-09 20:54 UTC by Michał Górny
Modified: 2013-08-12 19:09 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
1. Fix owner search not to modify the list (0001-Fix-vartree._owners_db.iter_owners-not-to-modify-arg.patch,911 bytes, patch)
2013-08-09 20:54 UTC, Michał Górny
Details | Diff
2. Apply COLLISION_IGNORE only to unowned (0002-Accept-COLLISION_IGNORE-only-on-unowned-files.patch,3.11 KB, patch)
2013-08-09 20:55 UTC, Michał Górny
Details | Diff
3. Do not limit collision search to 20 files (0003-Don-t-stop-collision-search-after-20-files.patch,1.03 KB, patch)
2013-08-09 20:55 UTC, Michał Górny
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-09 20:54:02 UTC
As described on IRC and in the patches. The current code applies COLLISION_IGNORE both to owned and unowned files, allowing for multiple packages to package the same file. This is a serious QA violation and definitely not the intent of this variable.

I believe that COLLISION_IGNORE should only make it possible to have smoother updates for people who use FEATURES=collision-protect. That is, to make it possible for developers and users to check for collisions, omitting the specific ones that we expect.

I'm attaching three patches.

1. makes the ownership checking function not consume the list passed to it. Otherwise, 'collisions' is already empty when we want to check COLLISION_IGNORE.

2. moves the COLLISION_IGNORE check later and does it only on unowned files. That is, if any of the files are owned, FEATURES=collision-protect and FEATURES=protect-owned are always going to prevent the merge.

3. removes the 20-file limit from ownership checking. In its current design, it may cause package to be merged with FEATURES=protect-owned despise of actual collision with file being owned by a package (if it was behind file 20.). It should be re-implemented in the ownership matching logic so that it stops after 20 matched files or something like that. Or -- preferably -- the logic would use a generator.
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-09 20:54:37 UTC
Created attachment 355540 [details, diff]
1. Fix owner search not to modify the list
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-09 20:55:04 UTC
Created attachment 355542 [details, diff]
2. Apply COLLISION_IGNORE only to unowned
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-09 20:55:20 UTC
Created attachment 355544 [details, diff]
3. Do not limit collision search to 20 files
Comment 4 Zac Medico gentoo-dev 2013-08-10 06:45:53 UTC
(In reply to Michał Górny from comment #0)
> Or -- preferably -- the logic would use a generator.

The get_owners method is implemented an iter_owners method that returns a generator, so perhaps we could use that instead.

(In reply to Michał Górny from comment #2)
> Created attachment 355542 [details, diff] [details, diff]
> 2. Apply COLLISION_IGNORE only to unowned

Searching for owners of things like *.py[co] files is basically a waste of time, since file collisions will be triggered for the corresponding .py files when relevant. So, may change significantly impact performance for common cases, without really adding any significant QA value.

(In reply to Michał Górny from comment #3)
> Created attachment 355544 [details, diff] [details, diff]
> 3. Do not limit collision search to 20 files

We need to benchmark the performance impact that this will have for large numbers of files. For example, looking up all the owners of a gentoo-sources package or something else that installs thousands of files. The `portageq owners` command is suitable for this benchmark, since it uses exactly the same search method.
Comment 5 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-10 07:58:29 UTC
(In reply to Zac Medico from comment #4)
> (In reply to Michał Górny from comment #2)
> > Created attachment 355542 [details, diff] [details, diff] [details, diff]
> > 2. Apply COLLISION_IGNORE only to unowned
> 
> Searching for owners of things like *.py[co] files is basically a waste of
> time, since file collisions will be triggered for the corresponding .py
> files when relevant. So, may change significantly impact performance for
> common cases, without really adding any significant QA value.

What if a developer remembers removes colliding .py file but forgots about .pyc, and the compiled file gets installed in some codepath? The point of QA checks is not to guess, and the case of having collisions is a corner case that usually happens only once and doesn't need to have great performance. Especially if we're going to stop the search after 20 files.
Comment 6 Rick Farina (Zero_Chaos) gentoo-dev 2013-08-12 13:16:52 UTC
Why exactly are we making crazy changes like this instead of just using "protect-owned" instead of "collision-protect"?
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-12 13:21:05 UTC
(In reply to Rick Farina (Zero_Chaos) from comment #6)
> Why exactly are we making crazy changes like this instead of just using
> "protect-owned" instead of "collision-protect"?

Because protect-owned does not protect unowned files, and I prefer better QA than easier life.

Beside that, right now COLLISION_IGNORE applies to protect-owned. It literally allows multiple packages to install the same file as long as it matches the list. That's what I want to change.
Comment 8 Zac Medico gentoo-dev 2013-08-12 16:50:45 UTC
(In reply to Michał Górny from comment #7)
That's nice, but your patches disturb a very delicate balance. If we "Apply COLLISION_IGNORE only to unowned" then we are likely to pollute the file list with irrelevant files so that the 20 file limit for owners lookup is no longer sufficient. If we "Do not limit collision search to 20 files", then we will hit severe performance problems in certain cases.
Comment 9 Zac Medico gentoo-dev 2013-08-12 16:58:33 UTC
We might reach some sort of compromise by using a different variable than COLLISION_IGNORE for files that should have their owners looked up.
Comment 10 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-12 17:03:57 UTC
Would a patch that would stop owner search after, say, 10 matches be sufficient?
Comment 11 Zac Medico gentoo-dev 2013-08-12 17:12:40 UTC
(In reply to Michał Górny from comment #10)
> Would a patch that would stop owner search after, say, 10 matches be
> sufficient?

No, because the search may have to grind through 99.9% of /var/db/pkg, or only 0.1%, depending on how unlucky/lucky you happen to be on a given run. Limiting it to 10 matches would have have no significance if you have to search through 99.9% of /var/db/pkg to find those matches.
Comment 12 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-12 17:25:59 UTC
How likely is that? Considering current list of COLLISION_IGNORE, in the worst case you'd have to check at most ~66% of files until you hit a owned collision. And that's very unlikely, in a realistic case a limit of 10 files would mean no more than 30-40 checked collisions.

Introducing another variable will just make the current one obsolete and empty. It's no gain for anyone.
Comment 13 Zac Medico gentoo-dev 2013-08-12 17:40:30 UTC
(In reply to Michał Górny from comment #12)
Maybe I misunderstood your patch idea described in comment #10. The owner search is already limited to 20 files, so you are suggesting to reduce it from 20 files to 10 files now? I think 20 is small enough as it is.
Comment 14 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-12 18:12:57 UTC
(In reply to Zac Medico from comment #13)
> (In reply to Michał Górny from comment #12)
> Maybe I misunderstood your patch idea described in comment #10. The owner
> search is already limited to 20 files, so you are suggesting to reduce it
> from 20 files to 10 files now? I think 20 is small enough as it is.

No, I mean limiting it to 10 matched (owned) files.
Comment 15 Zac Medico gentoo-dev 2013-08-12 18:36:42 UTC
(In reply to Michał Górny from comment #14)
> No, I mean limiting it to 10 matched (owned) files.

You can't know if any given file is owned unless you search through *all* of /var/db/pkg though, and that amount that you would have to search to find any owners is *completely* random.
Comment 16 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-08-12 18:45:19 UTC
True. Still, the search is necessary for QA reasons and only happens whenever there is a collision, which either means something is seriously broken (and then it needs to be caught) or we're doing a migration which is a one-time wait.
Comment 17 Zac Medico gentoo-dev 2013-08-12 19:09:22 UTC
I'd be more comfortable with forcing a search for owners of *.py[co] files if I had an assurance that all atable ebuilds in the tree had already been fixed to include *.py[co] files in CONTENTS.