emerge --sync gives me this message every time: ... >>> Updating Portage cache: 100% /etc/portage/postsync.d/q-reinitialize: line 1: 810714 Segmentation fault /usr/bin/q -r * spawn failed of /etc/portage/bin/post_sync However, if I run /etc/portage/bin/post_sync from the command line there is no problem. I'll post the rest of my discoveries after posting this (bugzilla just yelled at me that the post is too long).
Created attachment 111201 [details] debugging-results.txt Here is the output after I looked for the problem a bit. It has some interesting gdb output... and well, it is my original post (before bugzilla yelled at me that it is too long).
Created attachment 111202 [details] emerge--info.txt And here is emerge --info.
Your CFLAGS looked sane and yet something has gone really wrong for you and I can't reproduce it. Best I can say for now is to simply try rebuilding the portage-utils. If this is happening on every --sync for you then clearly the best option is to remove the +x bit in the q-reinitialize script.
(In reply to comment #3) > Your CFLAGS looked sane and yet something has gone really wrong for you > and I can't reproduce it. Best I can say for now is to simply try > rebuilding the portage-utils. Could it be amd64 or kernel related? It is reproducible on two separate machines here -- the one I reported for (attachment #111202 [details]) and a dual opteron. Both are running vanilla 2.6.20 but the problem was present earlier. By the way, I had already tried rebuilding portage-utils (sort of). I don't know how diligently you looked at attachment #111201 [details], but I compiled portage-utils using "make debug" in its $S and with -O0 (so that gdb gives readable output) and it still didn't work. > If this is happening on every --sync for you then > clearly the best option is to remove the +x bit in the q-reinitialize script. Oh, well, thanks for the advice, but it's not *that* bad. It's actually a good reminder that there is something that needs fixing.
(In reply to comment #4) > Oh, well, thanks for the advice, but it's not *that* bad. It's actually a good > reminder that there is something that needs fixing. I guess but the hard part to tell is that. It's not resproducable here and it only happens to you when called from portage. We can see from the backtrace that applets is clearly getting messed up. But why I can't say. I did try on an amd64-multilib host. miranda bin # emerge --sync >>> Starting rsync with rsync://64.127.121.98/gentoo-portage... Welcome to owl.gentoo.org Server Address : 64.127.121.98 Contact Name : mirror-admin@gentoo.org Hardware : 4 x Intel(R) Xeon(TM) CPU 2.40GHz, 1024MB RAM Please note: common gentoo-netiquette says you should not sync more than once a day. Users who abuse the rsync.gentoo.org rotation may be added to a temporary ban list. MOTD brought to you by motd-o-matic, version 0.3 receiving file list ... done ./ metadata/ metadata/timestamp.chk deleting .ebuild.x Number of files: 144717 Number of files transferred: 1 Total file size: 164787760 bytes Total transferred file size: 32 bytes Literal data: 32 bytes Matched data: 0 bytes File list size: 3368401 File list generation time: 1.319 seconds File list transfer time: 0.000 seconds Total bytes sent: 193 Total bytes received: 3368924 sent 193 bytes received 3368924 bytes 1347646.80 bytes/sec total size is 164787760 speedup is 48.91 >>> Updating Portage cache: 100% miranda bin # echo $? 0 miranda bin # wc -l /usr/portage/.ebuild.x 22925 /usr/portage/.ebuild.x miranda bin # qdepends -k CFLAGS app-portage/portage-utils-0.1.24 app-portage/portage-utils-0.1.24: -march=k8 -fomit-frame-pointer -O2 -pipe
I hope I'm not missing the obvious but.... (In reply to comment #5) > sent 193 bytes received 3368924 bytes 1347646.80 bytes/sec > total size is 164787760 speedup is 48.91 > > >>> Updating Portage cache: 100% > > miranda bin # echo $? > 0 ... in the end, shouldn't you be getting a message like q: Updating ebuild cache ... q: Finished 22925 entries in 0.255503 second You sure you have +x on q-reinitialize? I promise to take a better look at this tonight.
(In reply to comment #6) > I hope I'm not missing the obvious but.... Indeed, I was. My q-reinitialize was older and did not have the -q. Anyway, I just tried - unpacking a stage3 tarball in a chroot - emerge portage portage-utils - chmod +x /etc/portage/postsync-d/q* - emerge --sync Could not reproduce the problem. Just for fun I symlinked /bin/bash in /etc/portage/postsync.d and am trying this and that but without much success. The problem *is* present but I cannot trace it. I am playing with gdb, but any pointers about what I could try are appreciated.
Alright, gdb payed off (sort of). I haven't pinpointed the problem, but it has to do with my huge INSTALL_MASK. I also have the feeling that 1024 is a magic number around there. There is something wrong in the make.conf parser, and even though I have no idea why it only gets triggered when called from portage, there is no doubt a problem there. I confirmed it in both 64-bit and 32-bit chroots. So, if you want a "steps to reproduce". 1. emerge -u portage portage-utils 2. chmod +x /etc/portage/postsync.d/q-reinitialize 3. add the INSTALL_MASK from attachment #111202 [details] to /etc/make.conf 4. emerge --sync 5. watch it blow up I'll see if splitting the mask in multiple lines will do any good (or I could simply drop it), but regardless... that would only be a temporary workaround.
Alright, this thing is reproducible without running it from portage. It's just that the critical length is different. I put in make.conf the following INSTALL_MASK="" INSTALL_MASK="${INSTALL_MASK} 123456789 123456789 "... (100 chars total) repeat the line above and adjust until the problem is triggered and here is some output for different values of INSTALL_MASK: chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m 1105 chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m q: Unknown applet 'q' 1106 chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m q: Unknown applet 'q' 1107 chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m Segmentation fault 1108 From a chroot the critical length was much lower -- less than 80 characters. Anyway, at the risk of becoming annoying, I thought I'd tell you how to easily reproduce the problem.
INSTALL_MASK=$(perl -e 'print "A " x 16384') q -r And I cant reproduce. Even putting your exact same install mask settings in make.conf I still can't reproduce. However note. INSTALL_MASK is defined a size of 1024 char install_mask[1024] = ""; on line 101 of main.c So if you want to set a software break on initialize_portage_env() that would probably help track it down a bit further.
... WHAT! I cannot reproduce this thing *anywhere* anymore. Not in a chroot, not in a real root on either of the machines. And it's not like I've done anything. I just don't know what to think. The only thing that changed since the last time I tried is the date. I'll try to break this curse by setting it WORKSFORME as that's exactly what is happening right now. I'd like to see that spirit try to prove *that* wrong.
The curse is broken, the problem is back :-D It is still hard for me to reproduce the exact conditions, but the problem is a simple overflow related to the maximum length of INSTALL_MASK. More precisely, the problem appears to be in strincr_var. Say you have two lines in /etc/make.conf like: INSTALL_MASK="some dummy looooong value (>1024 bytes)" INSTALL_MASK="foo foo bar bar" what happens when the first line is parsed is that strincr_var properly sets vars_to_read[1].value to the dummy looong value, truncated at 1024 bytes. What happens when the next line is parsed, is that strincr_var is called with an already full vars_to_read[1}.value. Nevertheless, it immediately appends a space and the full value of the currently being parsed line. This is not correct. Furthermore, it *appends* the value of the line, even though it rather has to overwrite the old value. I am pretty sure that I only had a single line in make.conf when I reported this, but what about fixing the problems one at a time.
Ok now we are starting to get somewhere. I can reproduce undesired behavior. Still no segv however. (echo INSTALL_MASK=\"$(perl -e 'print "A" x 1024')\" ; echo INSTALL_MASK=\"\${INSTALL_MASK} $(perl -e 'print "A" x 1024')\" ; echo INSTALL_MASK=\"\${INSTALL_MASK} $(perl -e 'print "A" x 1024')\") >> /etc/make.conf Then DEBUG=1 q For me I can see PORTDIR= got overwritten. I'll see what I can do..
Created attachment 111448 [details, diff] q-bug-168334.diff Give this a spin please.. I added a sanity check and raised the default size of install_mask quite a bit. Then raised the size of some other buffers such as binhost,features.. cvs -d:pserver:anonymous@anoncvs.gentoo.org:/var/cvsroot -q co -R gentoo-projects/portage-utils
I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It should be.
I'll try the patch as soon as I get home (in 6 hours I guess). Until then... I recall that INSTALL_MASK was getting appended to even without having ${INSTALL_MASK}. This may be a separate problem, but thought I'd make it clear. Steps to reproduce: echo INSTALL_MASK=foo >> make.conf echo INSTALL_MASK=bar >> make.conf env DEBUG=1 q >/dev/null look at INSTALL_MASK="foo bar"
(In reply to comment #15) > I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It > should be. OK, I can confirm that it doesn't segfault now. Which is great and I thank you for that. However, I cannot say that it works, because the parser certainly doesn't behave normally - it can only increment INSTALL_MASK and doesn't care about the presence of ${INSTALL_MASK} (short of removing it) - it doesn't support ${OTHERVAR} either (as taken from a comment) - CONFIG_PROTECT is defined as _Q_STR (same as ARCH) while INSTALL_MASK is _Q_ISTR (same as FEATURES). I am pretty sure that CONFIG_PROTECT should be _Q_ISTR as well. - personally, I believe that if the parser is properly written, there should be no need for _Q_STR *and* _Q_ISTR Is there any good reason not to use "portageq envvar" and avoid reading the profiles altogether?
(In reply to comment #17) > (In reply to comment #15) > > I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It > > should be. > > OK, I can confirm that it doesn't segfault now. Which is great and I thank you > for that. good > However, I cannot say that it works, because the parser certainly doesn't > behave normally well thats somewhat debateable. "normally in the terms of q" or normally in the terms of pythons shellx code. or normally in the terms of a shell.. > - it can only increment INSTALL_MASK and doesn't care about the presence of > ${INSTALL_MASK} (short of removing it) I'll look into that when I get a chance. > - it doesn't support ${OTHERVAR} either (as taken from a comment) Thats expected and I don't expect that to change anytime soon. > - CONFIG_PROTECT is defined as _Q_STR (same as ARCH) while INSTALL_MASK is > _Q_ISTR (same as FEATURES). I am pretty sure that CONFIG_PROTECT should be > _Q_ISTR as well. Yeah ok. _Q_ISTR was coded after the orig _Q_STR code. > - personally, I believe that if the parser is properly written, there should be > no need for _Q_STR *and* _Q_ISTR It's not exactly easy to write a bash parser in c. > Is there any good reason not to use "portageq envvar" and avoid reading the > profiles altogether? Yeah we don't want to shell out and make python calls.
This is released in 0.1.25 Bug #168334 ; q -r dies with a segfault after emerge --sync Bug #168442 ; does not properly parse the profile location Bug #170795 ; add a -E/--eclass option to qgrep Bug #170797 ; add a -s/--skip-comments option to qgrep Bug #171024 ; opening '/usr/portage/.metadata.x' failed Bug #171374 ; Misc enhancements for qgrep Bug #172240 ; -A/-B options for qgrep (context lines) Bug #172338 ; qgrepping through installed ebuilds (in the VDB) Bug #173005 ; Colorized output for qgrep.
Closing