Hi Az, You're going to just love this little mystery.... Ok first, set the scene, in the land of ppc64, 64 bit kernel, 64 bit user space sandbox has been 1.2.11 and 1.2.12 kernels I've seen this on, 2.6.11, 2.6.12.5..... this has been recreated on power4, power5 so we're fairly sure it's not hardware related. So this was initial found in bug #108060 but I'm creating a new bug to keep things clearer .. least so I hope. Basically as initially reported people would be doing emerge system and poof at emerge readline their systems would reboot. Finally was able to narrow this down significantly. It can be recreated as easy as issuing sandbox and running the following small c program inside the sandbox. I deeply suspect it's something to do with sandbox and signals. This program executes just fine outside of the sandbox. (FEATURES="-sandbox -usersandbox" and things are fine too) Thankfully one can gdb inside of the sandbox and it happily fails in gdb so I've been able to get more data out of it. So look at the c program I'll attach, I'll drop the output one gets and lemme know what you think. Last this is marked critical since the average bear can't miss the problem and can't install. Granted there's at least a workaround ... but well not everyone is smart enough to catch up with those kinds of things. Appreciate any help advise or what have you that you might have. rangerpb I think has this setup on the ppc64 box in oregon so you could have access to a box and debug direct if you like. Hop in #gentoo-ppc64 if you like.
Created attachment 71099 [details] recreation program run in sandbox and reliably reboots the system
the output when the program is run in gdb is : 2 sigint getpd = 1 before set sig after set sig after 1st kill 0 output when run outside of sandbox 2 sigint getpd = 17257 before set sig after set sig handler after 1st kill 1 handler
boggle ok .. it's late ... kill to pid 1?!?!?
ok .. backed off the 2.3.5 glibc to the older "stable" 2.3.4 glibc and things appear to be ok with this same version of sandbox, I am deeply curious now if sandbox does anything to wrapper getpid or well .. exactly how the sandbox voodoo works as it seems fairly obvious that sandbox has uncovered some kind of bug in glibc.
does not affact ppc64 with 32bit userland..
FYI, I think that this problem resembles Bug #92794. And I write some test results below. 1. extracts the stage3 file for testing 2. chrooting it 3. execute `as -v` 4. execute Attachment 71099 [details] program 5. execute Attachment 71099 [details] program in sandbox [glibc-2.3.5-r2 nptl environment + 2.6.12-gentoo-r10 kernel] 3. # as -v GNU assembler version 2.16.1 (powerpc64-unknown-linux-gnu) using BFD version 2.16.1 4. # ./a.out 2 sigint getpd = 6386 before set sig after set sig handler after 1st kill 1 handler 5. The machine was rebooted. [glibc-2.3.5-r2 nptl environment + 2.6.13-gentoo-r4 kernel] 3. # as -v Segmentation fault 4. # ./a.out 2 sigint getpd = 6396 before set sig after set sig handler after 1st kill 1 handler 5. # ./a.out 2 sigint getpd = 6406 before set sig after set sig handler after 1st kill 1 handler 6. (special test -- execute `as -v` in sandbox) # as -v GNU assembler version 2.16.1 (powerpc64-unknown-linux-gnu) using BFD version 2.16.1 [glibc-2.3.5-r2 nptlonly environment + 2.6.12-gentoo-r10/2.6.13-gentoo-r4 kernel] 3. # as -v GNU assembler version 2.16.1 (powerpc64-unknown-linux-gnu) using BFD version 2.16.1 4. # ./a.out 2 sigint getpd = 10423 before set sig after set sig handler after 1st kill 1 handler 5. # ./a.out 2 sigint getpd = 10440 before set sig after set sig handler after 1st kill 1 handler [glibc-2.3.5-r2 linuxthreads environment + 2.6.12-gentoo-r10/2.6.13-gentoo-r4 kernel] 3. # as -v GNU assembler version 2.16.1 (powerpc64-unknown-linux-gnu) using BFD version 2.16.1 4. # ./a.out 2 sigint getpd = 10409 before set sig after set sig handler after 1st kill 1 handler 5. # ./a.out 2 sigint getpd = 10411 before set sig after set sig handler after 1st kill 1 handler
stage3 files for testing glibc-2.3.5-r2 nptl environment http://dev.gentoo.org/~nigoro/ppc64/stage3-ppc64-64ul-20051019-nptl.tar.bz2 glibc-2.3.5-r2 nptlonly environment http://dev.gentoo.org/~nigoro/ppc64/stage3-ppc64-64ul-20051023-nptlonly.tar.bz2 glibc-2.3.5-r2 linuxthreads environment http://dev.gentoo.org/~nigoro/ppc64/stage3-ppc64-64ul-20051023-lt.tar.bz2
Well, nothing we do sandbox side I can think of .. no wrapper or anything for getpid() ... Not sure if we should set the process group or something ? But either way I assume its a glibc bug ppc64 side.
more of a bug in glibc/kernel afterall, userspace shouldnt cause kernel to reboot like that
(In reply to comment #6) > FYI, I think that this problem resembles Bug #92794. I do not see why is a dup of bug #92794 ... it clearly shows from Tom's testing results that getpid() returns pid 1, which is init, which will make the box reboot .... So whatever machanics glibc/kernel side for ppc64 gets the wrong pid, and as older glibc works for him, I guess its an issue with glibc-2.3.5. Tom, tried to use the latest masked snapshot of glibc yet ?
*** Bug 118354 has been marked as a duplicate of this bug. ***
anyone can confirm this with glibc-2.3.6?
I cannot reproduce those reboots with glibc-2.3.6-r4. marking as FIXED. please reopen if those reboots still occur.