>=sys-apps/portage-2.2.2.2_rc68 : emerge hangs up if bashrcng is enabled. bashrcng is from layman gechi overlay: http://gechi-overlay.sourceforge.net/ <source type="svn">https://gechi-overlay.svn.sourceforge.net/svnroot/gechi-overlay/overlay/</source> Reproducible: Always Steps to Reproduce: 1. layman -a gechi 2. emerge =app-portage/bashrcng-shmfs-1.2.6 3. eselect bashrcng set 1.1.4 4. eselect bashrcng enable shmfs 5. emerge any-atom Actual Results: emerge hangs up. Expected Results: emerge builds fine.
Created attachment 244521 [details] emerge --info
Created attachment 244527 [details] emerge data tail from emerge -f --nodeps sys-apps/portage >strace_stdout.txt 2>trace_stderror.txt
Created attachment 244529 [details] output from strace
Created attachment 244531 [details] ps aux|grep ebuild|grep -v grep sent after ctrl-c from previous strace command.
upstream related bug: http://sourceforge.net/tracker/?func=detail&aid=3052915&group_id=176946&atid=879268
What exactly does this thing do?
Apparently bashrcng is corrupting $PORTAGE_BUILDIR/.ipc_in which a fifo that emerge uses to listen for interprocess communication via the ebuild-ipc helper. So, how and why is bashrcng corrupting this fifo? I can guess that the "how" is that it is copying it to a separate filesytem, which would make it a new inode and thus emerge would be listening at the old inode that's been lost. So, the main questions are why does bashrcng do this and what alternatives are there to its current behavior.
Well, portage-2.2_rc69 here, no bashrcng-shmfs installed, yet it hangs on call ebuil.sh prerm while executing command: /usr/bin/python2.6 /usr/lib64/portage/bin/ebuild-ipc.py exit 0 emerge --info
Upon CTRL+C: >>> Installing (1 of 18) virtual/libiconv-0 * checking 0 files for package collisions >>> Safely unmerging already-installed instance... ^C Exiting on signal 2 Traceback (most recent call last): File "/usr/lib64/portage/bin/ebuild-ipc.py", line 76, in <module> sys.exit(ebuild_ipc_main(sys.argv[1:])) File "/usr/lib64/portage/bin/ebuild-ipc.py", line 73, in ebuild_ipc_main return ebuild_ipc.communicate(args) File "/usr/lib64/portage/bin/ebuild-ipc.py", line 45, in communicate return self._communicate(args) File "/usr/lib64/portage/bin/ebuild-ipc.py", line 52, in _communicate output_file = open(self.ipc_in_fifo, 'wb') KeyboardInterrupt
(In reply to comment #9) > File "/usr/lib64/portage/bin/ebuild-ipc.py", line 52, in _communicate > output_file = open(self.ipc_in_fifo, 'wb') > KeyboardInterrupt Does `lsof | grep .ipc_in` show the emerge process listening to $PORTAGE_BUILDIR/.ipc_in, and does it say that the inode has been deleted or anything odd like that? The ipc is well tested in my stage builds and it works flawlessly here.
> Does `lsof | grep .ipc_in` show the emerge process listening to > $PORTAGE_BUILDIR/.ipc_in, and does it say that the inode has been deleted or > anything odd like that? The ipc is well tested in my stage builds and it works > flawlessly here. > I have similar problems emerging net-libs/libwww-5.4.0-r7. >>> Installing (2 of 23) net-libs/libwww-5.4.0-r7 hangs. lsof prints this: lsof | grep .ipc_in emerge 10943 root 5r FIFO 8,5 0t0 2242486 /var/tmp/binpkgs/net-libs/libwww-5.4.0-r7/.ipc_in
(In reply to comment #11) > I have similar problems emerging net-libs/libwww-5.4.0-r7. >>> Installing (2 of > 23) net-libs/libwww-5.4.0-r7 hangs. On irc we found that it was the pkg_prerm phase that was hanging. When it was hung up we checked the content of /var/tmp/binpkgs/net-libs/libwww-5.4.0-r7/temp/environment and the pkg_prerm function was defined but it contained only a return statement and nothing more. We removed /var/db/pkg/net-libs/libwww-5.4.0-r7/libwww-5.4.0-r7.ebuild in order to disable the pkg_prerm phase. After that the problem no longer occurred for the package, so now we don't know what was wrong with the environment of the old instance.
(In reply to comment #12) We do know that the output of `ps | grep ebuild` looked like this when it was hung: 15850 pts/0 SN+ 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh prerm 15871 pts/0 SN+ 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh prerm 15884 pts/0 SN+ 0:00 /usr/bin/python2.6 /usr/lib64/portage/bin/ebuild-ipc.py exit 0
Is it just "shmfs-plugin" that's not working? Because I use bashrcng with bashrcng-lafilefixer plugin and it works like a charm.
(In reply to comment #13) > (In reply to comment #12) > We do know that the output of `ps | grep ebuild` looked like this when it was > hung: > > 15850 pts/0 SN+ 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh prerm > 15871 pts/0 SN+ 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh prerm > 15884 pts/0 SN+ 0:00 /usr/bin/python2.6 > /usr/lib64/portage/bin/ebuild-ipc.py exit 0 > By enabling shmfs plugin it hangs but my "ps aux | grep ebuild" output is quite different: root 30462 0.2 0.0 12004 2364 pts/0 S+ 17:31 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh setup root 30537 0.0 0.0 12136 1796 pts/0 S+ 17:31 0:00 /bin/bash /usr/lib64/portage/bin/ebuild.sh setup root 30559 0.4 0.1 60056 6840 pts/0 S+ 17:31 0:00 /usr/bin/python2.6 /usr/lib64/portage/bin/ebuild-ipc.py exit 0 It seems to hang on "setup" instead of prerm
(In reply to comment #15) > By enabling shmfs plugin it hangs but my "ps aux | grep ebuild" output is quite > different: Right, the people who aren't using bashrcng with shmfs plugin really have a separate issue and they should file a new bug.
Since bug 335777 (2.2_rc75 and 2.1.9) ebuild-ipc will time out after 40 seconds instead of hanging indefinitely.
Since portage-2.1.9.6 and 2.2_rc82 the behavior may be a little different, as described in bug #336142, comment #25.
(In reply to comment #12) > After that the problem no longer occurred for > the package, so now we don't know what was wrong with the environment of the > old instance. Now I've found that in some rare cases bash unset statements can fail (seems like some sort of memory corruption). This can cause stale PORTAGE_BUILDDIR settings from /var/db/pkg/*/*/environment.bz2 to leak into the pkg_prerm environment and interfere with ebuild-ipc. There is a workaround for this issue in the following two commits: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=4319c6525684013f76cf4294d417a3250b690e34 http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0d410db2aedbb5ec6c61a342e4bc56f825831ca5
the last workaround seem not working for me.
(In reply to comment #20) > the last workaround seem not working for me. Right, that was a separate issue from the bashrcng problem, but with similar symptoms.
sys-apps/portage-2.2_rc97 seems no more affected by this bug in a amd64 gentoo box, while it is in another x86 gentoo box of mine, embedded in a chroot of the previous amd64 box. please, ask me for any kind of documentation you need.
I guess we can consider with fixed by the ability to disable USE=ipc in portage portage ebuild.
just few questions: first: in my amd64 envinronment (the working one), the use ipc is enabled, while in my embedded system the ipc use flag is needed. what should I do (if possibile) to have the ipc use flag working inside the chroot? In this moment, a mount comand from amd64 envinronment, shows these virtual directories enabled (/mnt/chroot/root32 is the chroot path): /dev on /mnt/chroot/root32/dev type none (rw,bind) /dev/pts on /mnt/chroot/root32/dev/pts type none (rw,bind) /dev/shm on /mnt/chroot/root32/dev/shm type none (rw,bind) /proc on /mnt/chroot/root32/proc type none (rw,bind) /proc/bus/usb on /mnt/chroot/root32/proc/bus/usb type none (rw,bind) /sys on /mnt/chroot/root32/sys type none (rw,bind) a second question: the ipc feature is documented as: "Use inter-process communication between portage and running ebuilds." what does it means exactly? it is useful just for commands such as genlop -c or it is important for other portage operation (for example to avoid possible corruptions when two emerge instances are running in the same time?)
(In reply to comment #24) > what should I do (if possibile) to have the ipc use flag working inside the > chroot? Either stop using bashrcng shmfs-plugin, for fix it so that it doesn't interfere with the $PORTAGE_BUILDDIR/.ipc_in fifo. > a second question: > the ipc feature is documented as: "Use inter-process communication between > portage and running ebuilds." > > what does it means exactly? it is useful just for commands such as genlop -c or > it is important for other portage operation (for example to avoid possible > corruptions when two emerge instances are running in the same time?) I've added some documentation here: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=d2d2f43ca1908b2827790bf3e844a0d90f13fede * There is a new ipc (inter-process communication) USE flag which is enabled by default. This allows portage to communicate with running ebuild processes, for things like best_version, has_version, and die calls in nested processes. This flag should remain enabled unless it is found to be incompatible with a specific profile or environment.
Also, it's worth noting that USE=ipc fixes bug 278895 and bug 315615.
sorry. I'm stupid. ipc doesn't work when bashrcng-shmfs is enabled nor in main system nor in chroot ...
sys-apps/portage-2.2_rc67 was deleted today from portage tree. please restore it because off this bug. no other version of sys-apps/portage-2.2* works with bashrcng-shmfs
You can disable ipc like this: mkdir /etc/portage/profile echo "sys-apps/portage ipc" >> /etc/portage/profile/package.use.mask
I know, but I think is not a good think to work without ipc.
When the "ipc" USE flag is disabled, it makes portage behave like older portage (such as portage-2.1.8.x and portage-2.2_rc67). So, there's no point is using older releases when you can get the same behavior with latest portage.