728986 – Gentoo Prefix bootstrap fails on Stage 3 due to circular dependencies (virtual/libcrypt-1-r1:0 & dev-lang/python-3.7.7-r2:3.7)

Bug 728986 - Gentoo Prefix bootstrap fails on Stage 3 due to circular dependencies (virtual/libcrypt-1-r1:0 & dev-lang/python-3.7.7-r2:3.7)

Summary: Gentoo Prefix bootstrap fails on Stage 3 due to circular dependencies (virtu...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo/Alt
Classification:	Unclassified
Component:	Prefix Support (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo Prefix

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2020-06-21 10:03 UTC by Sammy Pfeiffer
Modified:	2020-06-30 07:42 UTC (History)
CC List:	1 user (show)

See Also:	720048
Package list:
Runtime testing required:	---

Attachments
Overlay to break the circular dependency during the install (libcryptbootstrap.tar,20.00 KB, application/x-tar) 2020-06-22 19:53 UTC, devourer	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sammy Pfeiffer 2020-06-21 10:03:02 UTC

I can't confirm if it happens on x86 too (prevented by https://bugs.gentoo.org/722784) but it does happen in amd64.

Error:

 * Error: circular dependencies:

(virtual/libcrypt-1-r1:0/1::gentoo, ebuild scheduled for merge) depends on
 (sys-libs/glibc-2.31-r5:2.2/2.2::gentoo, ebuild scheduled for merge) (runtime)
  (dev-lang/python-3.7.7-r2:3.7/3.7m::gentoo, ebuild scheduled for merge) (buildtime)
   (virtual/libcrypt-1-r1:0/1::gentoo, ebuild scheduled for merge) (buildtime_slot_op)

 * Note that circular dependencies can often be avoided by temporarily
 * disabling USE flags that trigger optional dependencies.

The following USE changes are necessary to proceed:
 (see "package.use" in the portage(5) man page for more details)
# required by sys-apps/portage-2.3.101-r2::gentoo[python_targets_python3_7,-build]
# required by app-admin/perl-cleaner-2.28::gentoo
# required by dev-lang/perl-5.30.3-r1::gentoo
# required by virtual/perl-Data-Dumper-2.174.0::gentoo
>=dev-lang/python-3.7.7-r2:3.7 ssl



Full log here: https://dev.azure.com/12719821/e566c963-8f77-4f01-b7bc-ae2d91b1334f/_apis/build/builds/2188/logs/41

If you want to find yourself in the docker image with this exact issue:
docker pull awesomebytes/gentoo_prefix_ci_stage3:2188
docker run -it awesomebytes/gentoo_prefix_ci_stage3:2188 /bin/bash

(with EPREFIX being /tmp/gentoo)

Comment 1 Fabian Groffen gentoo-dev

2020-06-21 10:35:01 UTC

It seems that

pkg_setup() {
    # see bug 682570
    [[ -z ${BOOTSTRAP_RAP} ]] && python-any-r1_pkg_setup
}

is no longer enough

Comment 2 Fabian Groffen gentoo-dev

2020-06-21 10:37:56 UTC

This is complicated, python deps are added in BDEPEND, seems we can only force this by merging one of the packages without deps.

@heroxbd: this is a RAP thing, do you see a way to work around this problem?

Comment 3 devourer 2020-06-22 19:53:28 UTC

Created attachment 645764 [details]
Overlay to break the circular dependency during the install

While the devs sort out a clean solution, you can bypass that particular block by changing the requirements on virtual/libcrypt to break the circular dependency during the install.
My solution has been to create a local repository (attached here) and to change the RDEPEND of libcrypt from  elibc_glibc? ( sys-libs/glibc[crypt(+),static-libs(+)?] ) to !bootstrap? ( elibc_glibc? ( sys-libs/glibc[crypt(+),static-libs(+)?] ) ).

Bear in mind that I have no idea if this has other side effects.

I won't pretend to understand much of the bootstrap script, but I also had to edit the flags to actually have ssl enabled, because on my machine, the script seems to be attempting the merging a second time if the first one failed (here, because of the SSL flag missing for python), and that second time was trying to install stuff in {$EPREFIX}/tmp/.

Comment 4 anb 2020-06-25 05:32:46 UTC

Hi,

I got the same issue. By using attachment 645764 [details] provided by devourer@noot-noot.org, I could get things rolling, then it stopped on file collision error:

-----
 * Messages for package sys-apps/coreutils-8.32-r1:

 * Package 'sys-apps/coreutils-8.32-r1' has internal collisions between
 * non-identical files (located in separate directories in the
 * installation image (${D}) corresponding to merged directories in the
 * target filesystem (${ROOT})):
 *
 * 	/home/test/.gentoo/tmp/usr/bin/basename
 * 		/home/test/.gentoo/tmp/bin/basename
 * 		/home/test/.gentoo/tmp/usr/bin/basename
 * 			Differences: type, mode
 *

...

 *
 * 	/home/test/.gentoo/tmp/usr/bin/yes
 * 		/home/test/.gentoo/tmp/bin/yes
 * 		/home/test/.gentoo/tmp/usr/bin/yes
 * 			Differences: type, mode
 *
 * Package 'sys-apps/coreutils-8.32-r1' NOT merged due to internal
 * collisions between non-identical files. If necessary, refer to your
 * elog messages for the whole content of the above message.
-----

while ${EPREFIX}/tmp/bin is a symlink to ${EPREFIX}/tmp/usr/bin.

Comment 5 devourer 2020-06-25 07:18:37 UTC

Hi anb,

This is also the error I had after applying the overlay. In my case, this was caused by Portage's behavior when failing to merge. I was fairly unclear in my previous message. I'll try to explain what happened so that you can see if you have the same issue:

While trying to merge the packages a first time, Portage will complain about a missing USE flag (ssl) for Python, indicating that it is necessary to proceed (see the second error in the message of this bug report).

This does not actually stop Portage: going fairly quickly, Portage will attempt to merge all these packages again, but this time, it will target the system in ${EPREFIX}/tmp. I have no idea why/how it changes target system, but the first attempt was doing the correct thing and targeting the system in ${EPREFIX}.

Thus, if you clear the missing USE flag error, it will proceed in the first attempt and successfully merge everything.

I did try writing to a package.use file to clear that particular error, but it didn't work (I don't remember if I tried doing it in both ${EPREFIX} and ${EPREFIX}/tmp, nor which one I did try it in).

The solution I ended up with is to edit bootstrap_prefix.sh, find the "-ssl" keyword in there (it's among many other disabled flags), and remove the minus sign so that it is actually activated.

Comment 6 anb 2020-06-25 15:06:57 UTC

Hi devourer,

Thanks for the details. I've been able to bootstrap the prefix now, here's several places where I applied workaround:

- coreutils complained file collision as ${EPREFIX}/tmp/bin was a symlink to ${EPREFIX}/tmp/usr/bin. Remove the symlink and copy ${EPREFIX}/tmp/usr/bin over fixed it.

- rsync failed because ${EPREFIX}/tmp/etc/init.d/rsyncd had a shabang of "/sbin/openrc-run" while openrc was not installed. I created a dumb script at "${EPREFIX}/tmp/sbin/openrc-run" with following content:

---
#!/bin/bash
echo "$@"
---

I think these are corner cases only happen when bootstrapping prefix, and the hacks landed in "${EPREFIX}/tmp", which won't affect the real prefix later. It would be good to know the steps in each bootstrap stage, and the purpose of using "${EPREFIX}/tmp" as I often find it confusing while things exist in both location(${EPREFIX} vs ${EPREFIX}/tmp).

Comment 7 Fabian Groffen gentoo-dev

2020-06-25 15:30:03 UTC

(In reply to anb from comment #6)
> I think these are corner cases only happen when bootstrapping prefix, and
> the hacks landed in "${EPREFIX}/tmp", which won't affect the real prefix
> later. It would be good to know the steps in each bootstrap stage, and the
> purpose of using "${EPREFIX}/tmp" as I often find it confusing while things
> exist in both location(${EPREFIX} vs ${EPREFIX}/tmp).

stage1: install bare tools, sufficient to install and run portage, into /tmp
stage2: install more tools without dependencies using portage into /tmp, to build a full system
stage3: build @system in /, using the somewhat proper/sane tools in /tmp

since some of the steps from stage3 are sometimes picking up parts from the host system, and a minimal USE-flag combination is used in stage3 to avoid cycles and extra dependencies, an emerge -e @system is performed to ensure all packages are installed and re-installed proper.

Result, everything in /tmp is considered unusable cruft.  It may be compiled for a different architecture (32-bits iso 64-bits), and it probably depends on host libs (e.g. /usr/lib/curses.so).  Also, stuff installed in /tmp may not have been installed by Portage, e.g. there is no administration of owned files there, hence, it's really a messy place to pull outselves out of the mud.  The bootstrap hence rm -Rf's /tmp as soon as it can and continues solely with the sane(r) tools from /.

Comment 8 devourer 2020-06-25 15:36:08 UTC

My understanding is that the script starts by compiling a few tools for a sane environment (to avoid having to depend on those available on the platform) such as wget, bash, and a few others (stage 1), then it somehow creates a very minimal Gentoo install in ${EPREFIX}/tmp (stage 2), which is then used to create the complete system in ${EPREFIX} (stage 3).

This would make having collisions (and, by implication, installation of software) in ${EPREFIX}/tmp during stage 3 something that should not happen. Are you sure that your system in ${EPREFIX} works fine? If you, for example, move ${EPREFIX}/tmp to ${EPREFIX}/tmp_back without changing anything else, does it still work as intended?

Comment 9 Benda Xu gentoo-dev

2020-06-29 23:48:55 UTC

(In reply to Fabian Groffen from comment #2)
> This is complicated, python deps are added in BDEPEND, seems we can only
> force this by merging one of the packages without deps.
> 
> @heroxbd: this is a RAP thing, do you see a way to work around this problem?

It turned out that the solution is surprisingly simple :)

Comment 10 Larry the Git Cow gentoo-dev

2020-06-29 23:53:03 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/proj/prefix.git/commit/?id=e77fd01734f21ec2e9c985c28ba4eb30c1b2bc9d

commit e77fd01734f21ec2e9c985c28ba4eb30c1b2bc9d
Author:     Benda Xu <heroxbd@gentoo.org>
AuthorDate: 2020-06-29 23:34:25 +0000
Commit:     Benda Xu <heroxbd@gentoo.org>
CommitDate: 2020-06-29 23:52:25 +0000

    scripts/bootstrap-prefix.sh: do not skip USE=ssl in stage3.
    
    USE=-ssl has been introduced in d830d32f64280bb 10 years ago to
    simplify bootstrap logic, when cryptography was not crucial.
    
    Now the assumptions do not hold anymore and USE=-ssl causes more
    cursed situations than it cures.
    
    This results in cleaner and more correct code.  As a by-product, it
    fixes Bug 728986.
    
    This has been tested on prefix-standalone, call for more tests.
    
    Reported-By: Sammy Pfeiffer, devourer, anb
    Closes: https://bugs.gentoo.org/728986
    
    Signed-off-by: Benda Xu <heroxbd@gentoo.org>

 scripts/bootstrap-prefix.sh | 4 ----
 1 file changed, 4 deletions(-)

Comment 11 Benda Xu gentoo-dev

2020-06-30 00:26:30 UTC

(In reply to devourer from comment #5)
 
> The solution I ended up with is to edit bootstrap_prefix.sh, find the "-ssl"
> keyword in there (it's among many other disabled flags), and remove the
> minus sign so that it is actually activated.

devourer, you have figured out the solution ahead of me :) Kudos, bro!

Comment 12 Sammy Pfeiffer 2020-06-30 05:31:13 UTC

The CI for amd64 is bootstrapping correctly again.

Thank you for your work!

(x86 still bugged)

Comment 13 Michael Haubenwallner (RETIRED) gentoo-dev

2020-06-30 07:42:31 UTC

(In reply to Larry the Git Cow from comment #10)
>     This has been tested on prefix-standalone, call for more tests.

FWIW, I've retriggered these CI builds:
https://dev.azure.com/ssi-gentoo/prefix-ci/_build?definitionId=8