First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 168334
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: portage-utils <portage-utils@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Georgi Georgiev <chutz@gg3.net>
Add CC:
CC:
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
bug.txt debugging-results.txt text/plain Georgi Georgiev 2007-02-25 15:38 0000 7.88 KB Details
einfo.txt emerge--info.txt text/plain Georgi Georgiev 2007-02-25 15:39 0000 5.94 KB Details
q-bug-168334.diff q-bug-168334.diff patch solar 2007-02-27 17:56 0000 1.01 KB Details | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 168334 depends on: Show dependency tree
Show dependency graph
Bug 168334 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)







View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2007-02-25 15:36 0000
emerge --sync gives me this message every time:
...
>>> Updating Portage cache:   100%
/etc/portage/postsync.d/q-reinitialize: line 1: 810714 Segmentation fault     
/usr/bin/q -r
 * spawn failed of /etc/portage/bin/post_sync

However, if I run /etc/portage/bin/post_sync from the command line there is no
problem.

I'll post the rest of my discoveries after posting this (bugzilla just yelled
at me that the post is too long).

------- Comment #1 From Georgi Georgiev 2007-02-25 15:38:21 0000 -------
Created an attachment (id=111201) [edit]
debugging-results.txt

Here is the output after I looked for the problem a bit. It has some
interesting gdb output... and well, it is my original post (before bugzilla
yelled at me that it is too long).

------- Comment #2 From Georgi Georgiev 2007-02-25 15:39:41 0000 -------
Created an attachment (id=111202) [edit]
emerge--info.txt

And here is emerge --info.

------- Comment #3 From solar 2007-02-25 17:35:10 0000 -------
Your CFLAGS looked sane and yet something has gone really wrong for you 
and I can't reproduce it. Best I can say for now is to simply try 
rebuilding the portage-utils. If this is happening on every --sync for you then
clearly the best option is to remove the +x bit in the q-reinitialize script.

------- Comment #4 From Georgi Georgiev 2007-02-26 00:57:48 0000 -------
(In reply to comment #3)
> Your CFLAGS looked sane and yet something has gone really wrong for you 
> and I can't reproduce it. Best I can say for now is to simply try 
> rebuilding the portage-utils.

Could it be amd64 or kernel related? It is reproducible on two separate
machines here -- the one I reported for (attachment #111202 [edit]) and a dual
opteron. Both are running vanilla 2.6.20 but the problem was present earlier.

By the way, I had already tried rebuilding portage-utils (sort of). I don't
know how diligently you looked at attachment #111201 [edit], but I compiled
portage-utils using "make debug" in its $S and with -O0 (so that gdb gives
readable output) and it still didn't work.

> If this is happening on every --sync for you then
> clearly the best option is to remove the +x bit in the q-reinitialize script. 

Oh, well, thanks for the advice, but it's not *that* bad. It's actually a good
reminder that there is something that needs fixing.

------- Comment #5 From solar 2007-02-26 01:25:33 0000 -------
(In reply to comment #4)
> Oh, well, thanks for the advice, but it's not *that* bad. It's actually a good
> reminder that there is something that needs fixing.

I guess but the hard part to tell is that. It's not resproducable here and it
only happens to you when called from portage. We can see from the backtrace
that applets is clearly getting messed up. But why I can't say.

I did try on an amd64-multilib host. 

miranda bin # emerge --sync
>>> Starting rsync with rsync://64.127.121.98/gentoo-portage...
Welcome to owl.gentoo.org

Server Address : 64.127.121.98
Contact Name   : mirror-admin@gentoo.org
Hardware       : 4 x Intel(R) Xeon(TM) CPU 2.40GHz, 1024MB RAM


Please note: common gentoo-netiquette says you should not sync more
than once a day.  Users who abuse the rsync.gentoo.org rotation
may be added to a temporary ban list.


MOTD brought to you by motd-o-matic, version 0.3

receiving file list ... done
./
metadata/
metadata/timestamp.chk
deleting .ebuild.x

Number of files: 144717
Number of files transferred: 1
Total file size: 164787760 bytes
Total transferred file size: 32 bytes
Literal data: 32 bytes
Matched data: 0 bytes
File list size: 3368401
File list generation time: 1.319 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 193
Total bytes received: 3368924

sent 193 bytes  received 3368924 bytes  1347646.80 bytes/sec
total size is 164787760  speedup is 48.91

>>> Updating Portage cache:  100%

miranda bin # echo $?      
0

miranda bin # wc -l /usr/portage/.ebuild.x 
22925 /usr/portage/.ebuild.x

miranda bin # qdepends -k CFLAGS app-portage/portage-utils-0.1.24
app-portage/portage-utils-0.1.24: -march=k8 -fomit-frame-pointer -O2 -pipe

------- Comment #6 From Georgi Georgiev 2007-02-26 01:45:08 0000 -------
I hope I'm not missing the obvious but....

(In reply to comment #5)
> sent 193 bytes  received 3368924 bytes  1347646.80 bytes/sec
> total size is 164787760  speedup is 48.91
> 
> >>> Updating Portage cache:  100%
> 
> miranda bin # echo $?      
> 0

... in the end, shouldn't you be getting a message like
q: Updating ebuild cache ...
q: Finished 22925 entries in 0.255503 second

You sure you have +x on q-reinitialize?

I promise to take a better look at this tonight.

------- Comment #7 From Georgi Georgiev 2007-02-26 09:49:06 0000 -------
(In reply to comment #6)
> I hope I'm not missing the obvious but....

Indeed, I was. My q-reinitialize was older and did not have the -q.

Anyway, I just tried 
- unpacking a stage3 tarball in a chroot
- emerge portage portage-utils
- chmod +x /etc/portage/postsync-d/q*
- emerge --sync
Could not reproduce the problem.

Just for fun I symlinked /bin/bash in /etc/portage/postsync.d and am trying
this and that but without much success. The problem *is* present but I cannot
trace it.

I am playing with gdb, but any pointers about what I could try are appreciated.

------- Comment #8 From Georgi Georgiev 2007-02-26 10:16:09 0000 -------
Alright, gdb payed off (sort of).

I haven't pinpointed the problem, but it has to do with my huge INSTALL_MASK. I
also have the feeling that 1024 is a magic number around there. There is
something wrong in the make.conf parser, and even though I have no idea why it
only gets triggered when called from portage, there is no doubt a problem
there.

I confirmed it in both 64-bit and 32-bit chroots.

So, if you want a "steps to reproduce".

1. emerge -u portage portage-utils
2. chmod +x /etc/portage/postsync.d/q-reinitialize
3. add the INSTALL_MASK from attachment #111202 [edit] to /etc/make.conf
4. emerge --sync
5. watch it blow up

I'll see if splitting the mask in multiple lines will do any good (or I could
simply drop it), but regardless... that would only be a temporary workaround.

------- Comment #9 From Georgi Georgiev 2007-02-26 11:51:43 0000 -------
Alright, this thing is reproducible without running it from portage. It's just
that the critical length is different.

I put in make.conf the following
INSTALL_MASK=""
INSTALL_MASK="${INSTALL_MASK} 123456789 123456789 "... (100 chars total)
repeat the line above and adjust until the problem is triggered and here is
some output for different values of INSTALL_MASK:

chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m
1105
chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m
q: Unknown applet 'q'
1106
chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m
q: Unknown applet 'q'
1107
chutz@possum ~ $ q >/dev/null ; portageq envvar INSTALL_MASK | wc -m
Segmentation fault
1108

From a chroot the critical length was much lower -- less than 80 characters.

Anyway, at the risk of becoming annoying, I thought I'd tell you how to easily
reproduce the problem.

------- Comment #10 From solar 2007-02-26 15:46:21 0000 -------
INSTALL_MASK=$(perl -e 'print "A " x 16384') q -r
And I cant reproduce.
Even putting your exact same install mask settings in make.conf I still can't
reproduce. 

However note. INSTALL_MASK is defined a size of 1024
char install_mask[1024] = "";  on line 101 of main.c

So if you want to set a software break on initialize_portage_env() that would
probably help track it down a bit further.

------- Comment #11 From Georgi Georgiev 2007-02-26 16:05:20 0000 -------
... WHAT! I cannot reproduce this thing *anywhere* anymore. Not in a chroot,
not in a real root on either of the machines. And it's not like I've done
anything. I just don't know what to think. The only thing that changed since
the last time I tried is the date.

I'll try to break this curse by setting it WORKSFORME as that's exactly what is
happening right now. I'd like to see that spirit try to prove *that* wrong.

------- Comment #12 From Georgi Georgiev 2007-02-27 01:24:34 0000 -------
The curse is broken, the problem is back :-D

It is still hard for me to reproduce the exact conditions, but the problem is a
simple overflow related to the maximum length of INSTALL_MASK.
More precisely, the problem appears to be in strincr_var.

Say you have two lines in /etc/make.conf like:
INSTALL_MASK="some dummy looooong value (>1024 bytes)"
INSTALL_MASK="foo foo bar bar"
what happens when the first line is parsed is that strincr_var properly sets
vars_to_read[1].value to the dummy looong value, truncated at 1024 bytes.

What happens when the next line is parsed, is that strincr_var is called with
an already full vars_to_read[1}.value. Nevertheless, it immediately appends a
space and the full value of the currently being parsed line. This is not
correct. Furthermore, it *appends* the value of the line, even though it rather
has to overwrite the old value.

I am pretty sure that I only had a single line in make.conf when I reported
this, but what about fixing the problems one at a time.

------- Comment #13 From solar 2007-02-27 17:41:59 0000 -------
Ok now we are starting to get somewhere. I can reproduce undesired behavior.
Still no segv however.

(echo INSTALL_MASK=\"$(perl -e 'print "A" x 1024')\" ; echo 
INSTALL_MASK=\"\${INSTALL_MASK} $(perl -e 'print "A" x 1024')\" ;  echo 
INSTALL_MASK=\"\${INSTALL_MASK} $(perl -e 'print "A" x 1024')\") >>
/etc/make.conf

Then 
DEBUG=1 q

For me I can see PORTDIR= got overwritten. I'll see what I can do..

------- Comment #14 From solar 2007-02-27 17:56:38 0000 -------
Created an attachment (id=111448) [edit]
q-bug-168334.diff

Give this a spin please..
I added a sanity check and raised the default size of install_mask quite a bit.
Then raised the size of some other buffers such as binhost,features..

cvs -d:pserver:anonymous@anoncvs.gentoo.org:/var/cvsroot -q co -R
gentoo-projects/portage-utils

------- Comment #15 From solar 2007-02-27 23:58:20 0000 -------
I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It
should be.

------- Comment #16 From Georgi Georgiev 2007-02-28 03:55:06 0000 -------
I'll try the patch as soon as I get home (in 6 hours I guess). Until then...
I recall that INSTALL_MASK was getting appended to even without having
${INSTALL_MASK}. This may be a separate problem, but thought I'd make it clear.
Steps to reproduce:

echo INSTALL_MASK=foo >> make.conf
echo INSTALL_MASK=bar >> make.conf
env DEBUG=1 q >/dev/null
look at INSTALL_MASK="foo bar"

------- Comment #17 From Georgi Georgiev 2007-02-28 10:14:41 0000 -------
(In reply to comment #15)
> I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It
> should be.

OK, I can confirm that it doesn't segfault now. Which is great and I thank you
for that.

However, I cannot say that it works, because the parser certainly doesn't
behave normally
- it can only increment INSTALL_MASK and doesn't care about the presence of
  ${INSTALL_MASK} (short of removing it)
- it doesn't support ${OTHERVAR} either (as taken from a comment)
- CONFIG_PROTECT is defined as _Q_STR (same as ARCH) while INSTALL_MASK is
  _Q_ISTR (same as FEATURES). I am pretty sure that CONFIG_PROTECT should be
  _Q_ISTR as well.
- personally, I believe that if the parser is properly written, there should be
  no need for _Q_STR *and* _Q_ISTR

Is there any good reason not to use "portageq envvar" and avoid reading the
profiles altogether?

------- Comment #18 From solar 2007-02-28 15:13:28 0000 -------
(In reply to comment #17)
> (In reply to comment #15)
> > I pushed this patch to cvs. So simply cvs co/up and see if all is fine.. It
> > should be.
> 
> OK, I can confirm that it doesn't segfault now. Which is great and I thank you
> for that.


good

> However, I cannot say that it works, because the parser certainly doesn't
> behave normally

well thats somewhat debateable. "normally in the terms of q" or normally in the
terms of pythons shellx code. or normally in the terms of a shell..

> - it can only increment INSTALL_MASK and doesn't care about the presence of
>   ${INSTALL_MASK} (short of removing it)

I'll look into that when I get a chance. 


> - it doesn't support ${OTHERVAR} either (as taken from a comment)

Thats expected and I don't expect that to change anytime soon.

> - CONFIG_PROTECT is defined as _Q_STR (same as ARCH) while INSTALL_MASK is
>   _Q_ISTR (same as FEATURES). I am pretty sure that CONFIG_PROTECT should be
>   _Q_ISTR as well.

Yeah ok. _Q_ISTR was coded after the orig _Q_STR code.


> - personally, I believe that if the parser is properly written, there should be
>   no need for _Q_STR *and* _Q_ISTR

It's not exactly easy to write a bash parser in c.

> Is there any good reason not to use "portageq envvar" and avoid reading the
> profiles altogether?

Yeah we don't want to shell out and make python calls.

------- Comment #19 From solar 2007-04-05 18:42:15 0000 -------
This is released in 0.1.25

Bug #168334 ; q -r dies with a segfault after emerge --sync
Bug #168442 ; does not  properly parse the profile location
Bug #170795 ; add a -E/--eclass option to qgrep
Bug #170797 ; add a -s/--skip-comments option to qgrep
Bug #171024 ; opening '/usr/portage/.metadata.x' failed
Bug #171374 ; Misc enhancements for qgrep
Bug #172240 ; -A/-B options for qgrep (context lines) 
Bug #172338 ; qgrepping through installed ebuilds (in the VDB) 
Bug #173005 ; Colorized output for qgrep.

------- Comment #20 From solar 2007-04-05 18:43:01 0000 -------
Closing

First Last Prev Next    No search results available      Search page      Enter new bug