Summary: | sys-apps/portage-9999: can't resolve upgrade to LLVM 17 snapshots | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Sam James <sam> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | esigra, mgorny, zhuyifei1999 |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=528836 | ||
Whiteboard: | Reverted for now | ||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 155723, 463976, 622270 | ||
Attachments: |
emerge -p -uvDU @world (portage-9999, bad)
emerge -p -uvDU @world (3.0.48.1-r1, good) emerge -p -uvDU @world --debug (3.0.48.1-r1, good) emerge -p -uvDU @world --debug (9999, bad) diff -ruN emerge-good-debug.log emerge-bad-debug.log /etc/portage/package.accept_keywords/llvm portage-bug908717-logs.tar.xz reproducer reproducer 2 POC patch |
Description
Sam James
![]() ![]() ![]() ![]() Created attachment 864098 [details]
emerge -p -uvDU @world (3.0.48.1-r1, good)
Created attachment 864099 [details]
emerge -p -uvDU @world --debug (3.0.48.1-r1, good)
Created attachment 864100 [details]
emerge -p -uvDU @world --debug (9999, bad)
Created attachment 864101 [details]
diff -ruN emerge-good-debug.log emerge-bad-debug.log
44afa8445dc46464200fe46c1e09e0c7475067bf ('depgraph: Don't ignore downgrades as missed_updates ', https://github.com/gentoo/portage/commit/44afa8445dc46464200fe46c1e09e0c7475067bf) is the first bad commit, interestingly (I was worried it'd be the subslot one). Created attachment 864108 [details]
/etc/portage/package.accept_keywords/llvm
Created attachment 864109 [details] portage-bug908717-logs.tar.xz First backtrack: handles first slot conflicts rebuilds, not surprised. Second backtrack: prune slot rebuilds. What surprises me here is that this backtrack causes perf to pull in the original installed version of clang. I would think that would recompute the slot conflict rebuilds but no, that doesn't happen until 4th backtrack. This backtrack will fail due to the slot conflict as both versions of clang:17 is pulled in. What's also interesting is that I installed clang-17.0.0_pre20230609, and perf[clang] on my own machine and couldn't reproduce this. The runtime_pkg_mask stays empty so the second backtrack does not happen. Ok the sequence of events that caused this to happen is clear to me now. To reproduce this you need both lldb and perf[clang] installed. During _solve_non_slot_operator_slot_conflicts, it tries to evaluate which versions of clang:17 to keep. - (dev-util/perf-6.3:0/0::gentoo, installed) pulls in sys-devel/clang-17.0.0_pre20230609. At this point it seems unaware that it needs to be rebuilt by the slot ABI upgrade, so this pull should be void. (i.e. it sells like a bug to me) - (dev-util/lldb-17.0.0_pre20230615:0/17.0.0_pre20230615::gentoo, ebuild scheduled for merge) pulls in sys-devel/clang-17.0.0_pre20230615. This is an upgrade with a strict version requirement, and lldb is non-slotted. This is correct. Slot conflict resolution then goes to be like: Conflict: (/, sys-devel/clang:17) keep: (sys-devel/clang-17.0.0_pre20230615:17/17.0.0_pre20230615::gentoo, ebuild scheduled for merge) keep: (sys-devel/clang-17.0.0_pre20230609-1:17/17.0.0_pre20230609::gentoo, installed) This in turn pulls in the runtime dependencies of clang: Conflict: (/, sys-devel/clang-runtime:17) keep: (sys-devel/clang-runtime-17.0.0_pre20230615:17/17::gentoo, ebuild scheduled for merge) keep: (sys-devel/clang-runtime-17.0.0_pre20230609:17/17::gentoo, installed) This then causes sys-devel/clang-runtime, among a few other packages, to be inserted into the feedback of the backtrack to have a slot conflict, and the first of the feedback is inserted into runtime_pkg_mask. Each individual "keep" is entered as a backtrack attempt. The patch "depgraph: Don't ignore downgrades as missed_updates" then sees that there was a runtime_pkg_mask entry for sys-devel/clang-runtime-17.0.0_pre20230609. This entry implies that portage had an attempted merge on sys-devel/clang-runtime-17.0.0_pre20230609. This triggers the logic to perform a backtrack, pruning rebuilds. I should also note, I *think* the backtrack failing in the end has something to do with prune rebuild causing the dependency to fail, and it doesn't enter slot_operator_update_probe to know that perf needs to be rebuilt, so this backtrack fails.
backtrack 2 3 and 4 are each individual attempts to mask each individual of the triple in the backtrack feedback:
> 'slot conflict': [[((<Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230615', 'merge', 'gentoo')>,
> {(<Package ('installed', '/', 'sys-devel/clang-17.0.0_pre20230609', 'nomerge', 'installed')>,
> '~sys-devel/clang-runtime-17.0.0_pre20230609')}),),
> ((<Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'merge', 'gentoo')>,
> {(<Package ('ebuild', '/', 'sys-devel/clang-17.0.0_pre20230615', 'merge', 'gentoo')>,
> '~sys-devel/clang-runtime-17.0.0_pre20230615')}),
> (<Package ('installed', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'nomerge', 'installed')>,
> {(<Package ('ebuild', '/', 'sys-devel/clang-17.0.0_pre20230615', 'merge', 'gentoo')>,
> '~sys-devel/clang-runtime-17.0.0_pre20230615')}))],
backtrack 2 masks Package ('installed', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'nomerge', 'installed')
backtrack 3 masks Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'merge', 'gentoo')
backtrack 4 masks Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230615', 'merge', 'gentoo')
Both 2 and 3 performs an upgrade, whereas 4 aborts the upgrade. 2 and 3 both fail because perf pulls in the old version (since perf's slot rebuild is pruned)... and 4 is what we get in the end, the upgrade unable to be "figured out".
So there are several parts of this that I think are bugs:
1. slot conflict doesn't know about slot ABI rebuilds and pulls in old versions as slot conflict resolution.
2. slot rebuild pruning is way too trigger happy. It prunes everything, even if that will result in a dependency failure in the next backtrack.
3. after slot rebuild pruning, the package doesn't get selected for rebuild before dependency failure.
> backtrack 2 3 and 4 are each individual attempts to mask each individual of
> the triple in the backtrack feedback:
>
> > 'slot conflict': [[((<Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230615', 'merge', 'gentoo')>,
> > {(<Package ('installed', '/', 'sys-devel/clang-17.0.0_pre20230609', 'nomerge', 'installed')>,
> > '~sys-devel/clang-runtime-17.0.0_pre20230609')}),),
> > ((<Package ('ebuild', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'merge', 'gentoo')>,
> > {(<Package ('ebuild', '/', 'sys-devel/clang-17.0.0_pre20230615', 'merge', 'gentoo')>,
> > '~sys-devel/clang-runtime-17.0.0_pre20230615')}),
> > (<Package ('installed', '/', 'sys-devel/clang-runtime-17.0.0_pre20230609', 'nomerge', 'installed')>,
> > {(<Package ('ebuild', '/', 'sys-devel/clang-17.0.0_pre20230615', 'merge', 'gentoo')>,
> > '~sys-devel/clang-runtime-17.0.0_pre20230615')}))],
>
> backtrack 2 masks Package ('installed', '/',
> 'sys-devel/clang-runtime-17.0.0_pre20230609', 'nomerge', 'installed')
> backtrack 3 masks Package ('ebuild', '/',
> 'sys-devel/clang-runtime-17.0.0_pre20230609', 'merge', 'gentoo')
> backtrack 4 masks Package ('ebuild', '/',
> 'sys-devel/clang-runtime-17.0.0_pre20230615', 'merge', 'gentoo')
>
> Both 2 and 3 performs an upgrade, whereas 4 aborts the upgrade. 2 and 3 both
> fail because perf pulls in the old version (since perf's slot rebuild is
> pruned)... and 4 is what we get in the end, the upgrade unable to be
> "figured out".
Just checked runtime_pkg_mask, this does not seem to be correct. It always masks sys-devel/clang-runtime-17.0.0_pre20230609.
Created attachment 864145 [details]
reproducer
> 3. after slot rebuild pruning, the package doesn't get selected for rebuild
> before dependency failure.
It seems at at least in my reproducer, and in Sam's stage3 tarball, this part eventually figures things out, so this is probably not really that bad.
(In reply to YiFei Zhu from comment #12) > Created attachment 864145 [details] > reproducer I didn't add asserts for the merge list, and since it resolves after a few backtracks anyways it's not as useful. I'll attach a test that distinguishes good and bad. Created attachment 864154 [details]
reproducer 2
Created attachment 864155 [details, diff]
POC patch
> 1. slot conflict doesn't know about slot ABI rebuilds and pulls in old
> versions as slot conflict resolution.
This patch addresses this part but I'm not very confident in the logic that it won't cause regressions. However, it passes all tests, including the one I attached (which this is a trivial solution of, actually). I'd have to construct some non-trivial tests in which a shot rebuild will cause more slot conflicts to truly test how this patch behaves.
And I think I want more opinions on this approach, if it even makes sense.
(In reply to YiFei Zhu from comment #17) > > 1. slot conflict doesn't know about slot ABI rebuilds and pulls in old > > versions as slot conflict resolution. > See also bug 528836. (In reply to YiFei Zhu from comment #16) > Created attachment 864155 [details, diff] [details, diff] > POC patch The code is definitely above my paygrade but your argumentation seems sound and I don't think there's really any way to know better than to try it. So +1 to merging it from me. Worst case, we can revert. (In reply to Michał Górny from comment #19) > (In reply to YiFei Zhu from comment #16) > > Created attachment 864155 [details, diff] [details, diff] [details, diff] > > POC patch > > The code is definitely above my paygrade but your argumentation seems sound > and I don't think there's really any way to know better than to try it. So > +1 to merging it from me. Worst case, we can revert. +1. I think we need to rip the bandaid off. And work on improving our test coverage for depgraph bits. Also, what I _might_ do is cherry-pick the Perl fix (https://github.com/gentoo/portage/commit/224207c7d1988a354d004507bb7ecfb90b4ef097) first, release that, and then we can commit a tidied up version of this. Not sure. (In reply to Sam James from comment #20) > (In reply to Michał Górny from comment #19) > > (In reply to YiFei Zhu from comment #16) > > > Created attachment 864155 [details, diff] [details, diff] [details, diff] [details, diff] > > > POC patch > > > > The code is definitely above my paygrade but your argumentation seems sound > > and I don't think there's really any way to know better than to try it. So > > +1 to merging it from me. Worst case, we can revert. > > +1. I think we need to rip the bandaid off. And work on improving our test > coverage for depgraph bits. > > Also, what I _might_ do is cherry-pick the Perl fix > (https://github.com/gentoo/portage/commit/ > 224207c7d1988a354d004507bb7ecfb90b4ef097) first, release that, and then we > can commit a tidied up version of this. Not sure. I think this alternative is better, yeah. The more I think about this code (the one I uploaded) the more I hate it. I described why just now on IRC: > 11:02:33 <zhuyifei1999_> uh so, what my patch does it it causes the slot operator rebuilds to be processed first before performing a backtrack on slot conflict masks. This works well if, by processing slot operator rebuilds, slot conflicts are gone > 11:03:24 <zhuyifei1999_> however, if by processing slot operator rebuilds, there are still slot conflicts, when they are solved it will cause another backtrack to prune slot rebuilds > 11:03:54 <zhuyifei1999_> and that may prune more than it should and cause it to break exactly like how we see in the bug > 11:04:06 <zhuyifei1999_> this is my "brain simulation". I didn't test it I think it's just more bandaid that makes the problem harder to appear but can appear anyways. |