Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 197521 - frequent crashes with hardened-sources-2.6.23 (possibly PAX related)
Summary: frequent crashes with hardened-sources-2.6.23 (possibly PAX related)
Status: RESOLVED CANTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Hardened (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: The Gentoo Linux Hardened Kernel Team (OBSOLETE)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-30 15:05 UTC by Wolfram Schlich (RETIRED)
Modified: 2009-07-20 18:32 UTC (History)
11 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
bzip2 compressed System.map of crashing kernel #1 (System.map-2.6.23-hardened.bz2,302.18 KB, application/octet-stream)
2007-10-31 00:20 UTC, Wolfram Schlich (RETIRED)
Details
bzip2 compressed System.map of crashing kernel #2 (System.map-2.6.23-hardened.jaheira.bz2,253.15 KB, application/octet-stream)
2007-10-31 10:23 UTC, Wolfram Schlich (RETIRED)
Details
bzip2 compressed System.map of crashing kernel #3 (System.map-2.6.23-hardened-gw13.bz2,326.41 KB, application/octet-stream)
2007-10-31 11:03 UTC, Wolfram Schlich (RETIRED)
Details
System.map-2.6.23-hardened from io (System.map-2.6.23-hardened.bz2,294.84 KB, application/octet-stream)
2007-11-01 18:22 UTC, Georg Weiss
Details
sched.o of io (sched.o,56.31 KB, application/octet-stream)
2007-11-01 20:14 UTC, Georg Weiss
Details
sched.o of 2.6.23-hardened-gw12 (sched.o,58.14 KB, application/octet-stream)
2007-11-01 20:21 UTC, Georg Weiss
Details
2.6.23-hardened-1-oops.jpeg (2.6.23-hardened-1-oops.jpeg,87.68 KB, image/jpeg)
2007-11-13 08:10 UTC, Krzysztof Pawlik (RETIRED)
Details
dsc00011.jpg - oops in schedule() (dsc00011.jpg,138.41 KB, image/jpeg)
2007-11-13 09:00 UTC, Krzysztof Pawlik (RETIRED)
Details
System.map-2.6.23-hardened-r1.gz (System.map-2.6.23-hardened-r1.gz,167.88 KB, application/octet-stream)
2007-11-13 09:02 UTC, Krzysztof Pawlik (RETIRED)
Details
sched.o.bz2 (sched.o.bz2,22.42 KB, application/octet-stream)
2007-11-13 13:00 UTC, Krzysztof Pawlik (RETIRED)
Details
memory.o.bz2 (memory.o.bz2,12.04 KB, application/octet-stream)
2007-11-13 13:00 UTC, Krzysztof Pawlik (RETIRED)
Details
config-2.6.23-hardened-r7-gw15 of "adminsrv" (config-2.6.23-hardened-r7-gw15.gz,10.08 KB, application/octet-stream)
2008-02-19 15:29 UTC, Georg Weiss
Details
System.map-2.6.23-hardened-r7-gw15 of "adminsrv" (System.map-2.6.23-hardened-r7-gw15.gz,351.23 KB, application/octet-stream)
2008-02-19 15:33 UTC, Georg Weiss
Details
crash.config (2.6.23-hardened-r7-pax_crash.config,27.06 KB, text/plain)
2008-02-26 00:24 UTC, Daniel Schröder
Details
System.map, .config, memory.o and sched.o (kernel_items-2.6.23-hardened-r8.tar.gz,297.94 KB, application/octet-stream)
2008-03-19 16:27 UTC, Nick P
Details
Crash screenshot (crash-alpha.png,41.04 KB, image/png)
2008-04-05 19:25 UTC, Wolfram Schlich (RETIRED)
Details
System.map of crash-alpha.png (System.map-2.6.24.4-grsec-2.1.11-200803262003,857.45 KB, text/plain)
2008-04-06 22:02 UTC, Wolfram Schlich (RETIRED)
Details
System.map of crash-alpha.png (System.map-2.6.23-hardened-r9,822.02 KB, text/plain)
2008-04-07 03:40 UTC, Wolfram Schlich (RETIRED)
Details
.config of linux-2.6.24.4-grsec-2.1.11-200803262003 (config.linux-2.6.24.4-grsec-2.1.11-200803262003,40.62 KB, text/plain)
2008-04-14 08:54 UTC, Wolfram Schlich (RETIRED)
Details
r11 crash (hardened-r11.jpg,81.74 KB, image/jpeg)
2008-05-10 12:02 UTC, Daniel Schröder
Details
2.6.24-r1 crash (system-1.jpeg,95.72 KB, image/jpeg)
2008-05-11 08:30 UTC, Daniel Schröder
Details
2.6.24-r1 crash (system-2.jpeg,124.81 KB, image/jpeg)
2008-05-11 08:31 UTC, Daniel Schröder
Details
crash probably triggered by slocate run (system2.netconsole,3.38 KB, text/plain)
2008-05-12 12:21 UTC, Daniel Schröder
Details
netconsole log from slocate testruns (system1.netconsole,44.54 KB, text/plain)
2008-05-12 14:19 UTC, Daniel Schröder
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfram Schlich (RETIRED) gentoo-dev 2007-10-30 15:05:09 UTC
"PAX: suspicious general protection fault: 0000 [...]"

See those URLs for screen pictures:
http://frupic.frubar.net/4641
http://frupic.frubar.net/4643

It only happens to machines running version 2.6.23 of hardened-sources,
all intel x86 (no amd64), but all completely different (different dell
models and even "homegrown" machines).

It's not triggerable so far, sometimes it happens right on boot (e.g.
process "udevd") or during regular operation after boot (several
different processes seen in the "Process:" output).
Comment 1 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-30 15:36:32 UTC
One of those .configs can be found here:

http://paste.frubar.net/6546
Comment 2 PaX Team 2007-10-30 21:24:32 UTC
it's a NULL deref caught by UDEREF, i'll need System.map to properly resolve the addresses and see what code triggered it. also next time don't hesitate to add me to CC, i'm a regular reader of (and contributor to) gentoo's bugzilla ;-).
Comment 3 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 00:16:28 UTC
(In reply to comment #2)
> it's a NULL deref caught by UDEREF, i'll need System.map to properly resolve
> the addresses and see what code triggered it.

Ok. Here's another one: http://frupic.frubar.net/4647
Will attach the corresponding System.map right after his comment.

> also next time don't hesitate to
> add me to CC, i'm a regular reader of (and contributor to) gentoo's bugzilla
> ;-).

Ok, thanks :o)
Comment 4 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 00:20:14 UTC
Created attachment 134756 [details]
bzip2 compressed System.map of crashing kernel #1

System.map of kernel seen crashing at http://frupic.frubar.net/4647
Unfortunately it was too big for attaching it uncompressed :(
Comment 5 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 00:26:45 UTC
I hope I'm not mixing something, as I cannot see anything
PAX related on the last crash screen, but that may well be
due to space constraints :)
Comment 6 PaX Team 2007-10-31 01:03:34 UTC
(In reply to comment #3)
> Ok. Here's another one: http://frupic.frubar.net/4647
> Will attach the corresponding System.map right after his comment.

hmm, i can't tell much from this one except the machine was going to shut down (?) when something bad triggered. can you also get the System.map for the other crashes?
Comment 7 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 10:23:41 UTC
Created attachment 134781 [details]
bzip2 compressed System.map of crashing kernel #2

This is the System.map from http://frupic.frubar.net/4641
Comment 8 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 11:03:27 UTC
Created attachment 134785 [details]
bzip2 compressed System.map of crashing kernel #3

This is the System.map from http://frupic.frubar.net/4643
Comment 9 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 13:00:02 UTC
Ok, phreak created a new hardened-sources revision
based on a newer grsecurity patch:

- =sys-kernel/hardened-sources-2.6.23 is based on
  grsecurity-2.1.11-2.6.23-200710121810.patch

- =sys-kernel/hardened-sources-2.6.23-r1 is based on
  grsecurity-2.1.11-2.6.23.1-200710301850.patch

phreak created an interdiff:

http://rafb.net/p/pJJbhK12.html

Dear pax guy, does the new snapshot contain a fix for
the crash problem, possibly?
Comment 10 Wolfram Schlich (RETIRED) gentoo-dev 2007-10-31 14:31:30 UTC
Interestingly, I have that kernel running on _one_ non-Intel machine
(an AMD Athlon 64, that is, but still running x86, not amd64) for
14 day now, and it didn't crash ever since...
Comment 11 PaX Team 2007-10-31 17:25:39 UTC
(In reply to comment #9)
> Dear pax guy, does the new snapshot contain a fix for
> the crash problem, possibly?

no, it won't, i'm still trying to figure out your bugs...
Comment 12 Georg Weiss 2007-11-01 18:22:17 UTC
Created attachment 134932 [details]
System.map-2.6.23-hardened from io

my olde netfinity 5000 crashed right now...

see:
http://frupic.frubar.net/4661

i will now go and try 2.6.13-hardened-r1
Comment 13 PaX Team 2007-11-01 19:02:41 UTC
(In reply to comment #12)
can you guys upload your kernel/sched.o please (Wolfram, for you it'd be the one of the 4643 kernel)?
Comment 14 Georg Weiss 2007-11-01 20:14:51 UTC
Created attachment 134951 [details]
sched.o of io

here you go. this is sched.o of "io"...
Comment 15 Georg Weiss 2007-11-01 20:21:52 UTC
Created attachment 134955 [details]
sched.o of 2.6.23-hardened-gw12

this is the sched.o of 2.6.23-hardened-gw12 (from http://frupic.frubar.net/4643)
Comment 16 Wolfram Schlich (RETIRED) gentoo-dev 2007-11-01 20:40:31 UTC
(In reply to comment #13)
> (In reply to comment #12)
> can you guys upload your kernel/sched.o please (Wolfram, for you it'd be the
> one of the 4643 kernel)?

Right now I can only provide sched.o of
http://frupic.frubar.net/4647 -- here it is:
http://dev.gentoo.org/~wschlich/tmp/sched.o.4647
Comment 17 Wolfram Schlich (RETIRED) gentoo-dev 2007-11-01 20:48:50 UTC
(In reply to comment #13)
> (In reply to comment #12)
> can you guys upload your kernel/sched.o please (Wolfram, for you it'd be the
> one of the 4643 kernel)?
> 

Ok, here you go :)
http://frupic.frubar.net/4643 -- here it is:
http://dev.gentoo.org/~wschlich/tmp/sched.o.4643
Comment 18 Wolfram Schlich (RETIRED) gentoo-dev 2007-11-01 20:56:04 UTC
(In reply to comment #17)
> (In reply to comment #13)
> > (In reply to comment #12)
> > can you guys upload your kernel/sched.o please (Wolfram, for you it'd be the
> > one of the 4643 kernel)?
> > 
> 
> Ok, here you go :)
> http://frupic.frubar.net/4643 -- here it is:
> http://dev.gentoo.org/~wschlich/tmp/sched.o.4643

Whoops, that one was for http://frupic.frubar.net/4641 :-)
Moved http://dev.gentoo.org/~wschlich/tmp/sched.o.4643
to http://dev.gentoo.org/~wschlich/tmp/sched.o.4641
Sorry for the confusion.
Comment 19 PaX Team 2007-11-02 20:38:17 UTC
so, after some investigation i'm none the wiser. here's the breakdown:

4641,4643,4661: what happens here is that during a context switch in the scheduler an important structure field (task_struct.prev_mm) appears to be NULL instead of a valid pointer. how on earth that can happen, i have no idea except i know we don't touch it (and it didn't look like some random memory corruption either, in all cases the pointer was clearly NULL).

4647: here we have the shutdown mechanism issue an IPI to all-but-self only to get it delivered to self - tough luck because the deliver mechanism uses a stack based structure that by this time gets trashed and the kernel ends up calling some bogus function pointer. this is again an impossible situation and of course we don't touch anything near APIC programming... so i have no idea how this could have occured.

guys, can you try out vanilla 2.6.23 and see what that brings on the table? also booting with vga=ext will preserve more of the crash info (or try netconsole logging), maybe the ones you reported weren't the first ones, in which case this was a wild goose chase ;).
Comment 20 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 08:10:33 UTC
Created attachment 135856 [details]
2.6.23-hardened-1-oops.jpeg

I've got this oops - sorry for terrible quality, but my phone sucks wrt photos.

Unfortunately I can't provide System.map currently, I'll rebuild the kernel and save it's System.map.
Comment 21 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 09:00:07 UTC
Created attachment 135858 [details]
dsc00011.jpg - oops in schedule()

This shows more details, System.map follows.
Comment 22 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 09:02:44 UTC
Created attachment 135859 [details]
System.map-2.6.23-hardened-r1.gz

System.map for attachment #135858 [details] - c06c6b31 is in schedule().

If you need anything else (like the sched.o) just tell me.
Comment 23 PaX Team 2007-11-13 12:54:26 UTC
(In reply to comment #22)
> Created an attachment (id=135859) [edit]
> System.map-2.6.23-hardened-r1.gz
> 
> System.map for attachment #135858 [details] [edit] - c06c6b31 is in schedule().
> 
> If you need anything else (like the sched.o) just tell me.

i'd like to have sched.o and mm/memory.o as well.
Comment 24 PaX Team 2007-11-13 12:55:42 UTC
i also uploaded a new test patch, would be nice if you could try it out. i only fixed the 'bad page state' bug in it, but would be still good to know that your case isn't related to that.
Comment 25 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 13:00:29 UTC
Created attachment 135877 [details]
sched.o.bz2

(In reply to comment #23)
> i'd like to have sched.o and mm/memory.o as well.

sched.o, memory.o follows.
Comment 26 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 13:00:41 UTC
Created attachment 135879 [details]
memory.o.bz2
Comment 27 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-13 13:02:31 UTC
(In reply to comment #24)
> i also uploaded a new test patch, would be nice if you could try it out. i only
> fixed the 'bad page state' bug in it, but would be still good to know that your
> case isn't related to that.

Could you provide URL?
Comment 28 Wolfram Schlich (RETIRED) gentoo-dev 2007-11-13 15:05:25 UTC
(In reply to comment #27)
> (In reply to comment #24)
> > i also uploaded a new test patch, would be nice if you could try it out. i only
> > fixed the 'bad page state' bug in it, but would be still good to know that your
> > case isn't related to that.
> 
> Could you provide URL?

,...as usual: http://www.grsecurity.net/~paxguy1/pax-linux-2.6.23-test12.patch
Comment 29 Krzysztof Pawlik (RETIRED) gentoo-dev 2007-11-14 08:13:19 UTC
I get complete, hard lock up with grsec reversed and test patch applied.
Comment 30 PaX Team 2007-11-14 10:57:56 UTC
(In reply to comment #29)
> I get complete, hard lock up with grsec reversed and test patch applied.

please start from a clean vanilla tree. if the hang still occurs, can you get a screenshot or at least describe the last message on screen (you might need to add earlyprintk=vga to the kernel commandline to see very early boot messages as they occur)? also attach your .config and even better, your bzImage (so i can try it in qemu).
Comment 31 Thomas Sachau gentoo-dev 2007-11-25 13:53:04 UTC
I dont know if my problem is the same as the one in this bug, if not, tell me and i will open another bug.

My problem is, i cannot give much information and i am also not able to reproduce the crashes. But for me hardened-sources-.26.23 and hardened-sources-2.6.23-r1 do somehow crash. After that, i cant do any keyboard input, but still use my mouse and do some things on the desktop. But if i try to kill my desktop, the screens goes black and i cant do something any more. I can login over ssh and do some things, but e.g. one time i tried to compile something and got an <defunct> emerge process. This happens with running X+e17(cvs) and also without running X.
I do not get any kernel oops or any other message about the crash, only the cursor does not blink any more (if i am on the konsole while the crash happens).

This is on a hardened/amd64/multilib profile with ~amd64 keyword.
Comment 32 PaX Team 2007-11-25 18:34:04 UTC
(In reply to comment #31)
> I dont know if my problem is the same as the one in this bug, if not, tell me
> and i will open another bug.

yours doesn't look like the kernel problem reported here, after all you could still ssh to the box, not to mention that you have a 64bit kernel, we're discussing problems with 32 bit ones here. so you should open a new report and provide more information as well if you can.
Comment 33 PaX Team 2007-11-29 22:07:29 UTC
(In reply to comment #19)
> so, after some investigation i'm none the wiser. here's the breakdown:
> 
> 4641,4643,4661: what happens here is that during a context switch in the
> scheduler an important structure field (task_struct.prev_mm) appears to be NULL
> instead of a valid pointer. how on earth that can happen, i have no idea except
> i know we don't touch it (and it didn't look like some random memory corruption
> either, in all cases the pointer was clearly NULL).
> 
> 4647: here we have the shutdown mechanism issue an IPI to all-but-self only to
> get it delivered to self - tough luck because the deliver mechanism uses a
> stack based structure that by this time gets trashed and the kernel ends up
> calling some bogus function pointer. this is again an impossible situation and
> of course we don't touch anything near APIC programming... so i have no idea
> how this could have occured.
> 
> guys, can you try out vanilla 2.6.23 and see what that brings on the table?
> also booting with vga=ext will preserve more of the crash info (or try
> netconsole logging), maybe the ones you reported weren't the first ones, in
> which case this was a wild goose chase ;).

someone just told me that he had the same issue and fixed it by switching to gcc 4.x. can you guys give that a try?
Comment 34 Brian Kroth 2007-12-05 15:38:13 UTC
> someone just told me that he had the same issue and fixed it by switching to
> gcc 4.x. can you guys give that a try?
> 

I think that was me.  Actually the reason I turned on the framebuffer was because of this bug, but I didn't think they were the same problem.  Here's a link to some screen captures.  Let me know if you need more info.  Also, still having the same problem/solution with hardened-sources-2.6.23-r3.

https://mywebspace.wisc.edu/bpkroth/web/kernel-bug-test/kernel-bug-test_2.6.23-hardened-r2_panic_2007-11-28.tar.bz2

Comment 35 PaX Team 2007-12-05 16:06:41 UTC
(In reply to comment #34)
> I think that was me.  Actually the reason I turned on the framebuffer was
> because of this bug, but I didn't think they were the same problem.  Here's a
> link to some screen captures.  Let me know if you need more info.  Also, still
> having the same problem/solution with hardened-sources-2.6.23-r3.
> 
> https://mywebspace.wisc.edu/bpkroth/web/kernel-bug-test/kernel-bug-test_2.6.23-hardened-r2_panic_2007-11-28.tar.bz2

uhm, i thought these were caused when you used the 3.4.x compiler and got 'fixed' by using 4.x, didn't they?
Comment 36 Brian Kroth 2007-12-05 16:17:16 UTC
> uhm, i thought these were caused when you used the 3.4.x compiler and got
> 'fixed' by using 4.x, didn't they?
> 

That's correct.  As in:

# cd /usr/src/linux
# gcc-config i686-pc-linux-gnu-4.2.2 && env-update && source /etc/profile && make clean bzImage modules modules_install install && gcc-config i686-pc-linux-gnu-3.4.6 && env-update && source /etc/profile && reboot

I know its big and nasty, but I don't want to forget to switch back to the hardened gcc for the rest of the system stuff.  I haven't tried this on a hardened gcc yet (like kevquinn's pieworld).  I just unmasked the one that's in portage.

BTW, the way that I've been able to "reproduce" this is by issuing a reboot command.  Probably 90% of the time it will cause an error message of some sort.  Every once in a while it will reboot without complaint.
Comment 37 PaX Team 2008-02-16 00:07:55 UTC
guys, can you test the latest grsec or pax patches? i think i fixed this bug which could have very well been a sideeffect of a bug that occured in a completely different place than what the oops'es reported. also can you tell who of you had HIGHPTE in his .config?
Comment 38 Georg Weiss 2008-02-16 17:45:24 UTC
HIGHPTE was unset in kernelconfig of "io".
I'm currently running 2.6.23-hardened-r4 since 20 days without any problems.

However @work i tried 2.6.23-hardened-r6 (with HIGHPTE) and those machines oopsed again.

You will find interesting that another guy @work is running 2.6.23-hardened-rx stable since _days_ with a slightly different kernelconfig (with HIGHPTE) on almost the same hardware (mostly dell machines). I'm currently checking differences and going to try it on some of my "test" machines.
Comment 39 Georg Weiss 2008-02-19 15:25:44 UTC
Hi

Another host (with fresh 2.6.23-hardened-r7) panic'ed (second time for 6 hours).
<http://frupic.frubar.net/5363>

I'm going to attach config.
Comment 40 Georg Weiss 2008-02-19 15:29:28 UTC
Created attachment 143980 [details]
config-2.6.23-hardened-r7-gw15 of "adminsrv"
Comment 41 Georg Weiss 2008-02-19 15:33:46 UTC
Created attachment 143982 [details]
System.map-2.6.23-hardened-r7-gw15 of "adminsrv"
Comment 42 Daniel Schröder 2008-02-22 08:30:15 UTC
same here but it startet first with 2.6.23-hardened-r7 and this is the diff .config between r4 and r7:
3,4c3,4
< # Linux kernel version: 2.6.23-hardened-r7
< # Thu Feb 21 03:05:02 2008
---
> # Linux kernel version: 2.6.23-hardened-r4
> # Sat Jan  5 01:32:31 2008
897a898
> CONFIG_PROC_KCORE=y
1043,1045c1044
< CONFIG_GRKERNSEC_PROC=y
< CONFIG_GRKERNSEC_PROC_USER=y
< CONFIG_GRKERNSEC_PROC_ADD=y
---
> # CONFIG_GRKERNSEC_PROC is not set
1069c1068
< CONFIG_GRKERNSEC_CHROOT_EXECLOG=y
---
> # CONFIG_GRKERNSEC_CHROOT_EXECLOG is not set
1072c1071
< CONFIG_GRKERNSEC_AUDIT_IPC=y
---
> # CONFIG_GRKERNSEC_AUDIT_IPC is not set
1076,1077c1075
< CONFIG_GRKERNSEC_PROC_IPADDR=y
< # CONFIG_GRKERNSEC_AUDIT_TEXTREL is not set
---
> # CONFIG_GRKERNSEC_PROC_IPADDR is not set
1085,1088c1083
< CONFIG_GRKERNSEC_TPE=y
< CONFIG_GRKERNSEC_TPE_ALL=y
< CONFIG_GRKERNSEC_TPE_INVERT=y
< CONFIG_GRKERNSEC_TPE_GID=0
---
> # CONFIG_GRKERNSEC_TPE is not set
1126,1131c1121,1123
< CONFIG_PAX_PAGEEXEC=y
< CONFIG_PAX_SEGMEXEC=y
< CONFIG_PAX_EMUTRAMP=y
< CONFIG_PAX_MPROTECT=y
< CONFIG_PAX_NOELFRELOCS=y
< CONFIG_PAX_KERNEXEC=y
---
> # CONFIG_PAX_PAGEEXEC is not set
> # CONFIG_PAX_SEGMEXEC is not set
> # CONFIG_PAX_KERNEXEC is not set
1145c1137
< CONFIG_PAX_MEMORY_UDEREF=y
---
> # CONFIG_PAX_MEMORY_UDEREF is not set
------------
i will use the old .config with r7 and hopefully this fix the problem for me
Comment 43 kfm 2008-02-22 12:37:04 UTC
(In reply to comment #42)
> i will use the old .config with r7 and hopefully this fix the problem for me

Are you sure you'd want to use the prior config? Without enabling either CONFIG_PAX_PAGEEXEC or CONFIG_PAX_SEGMEXEC, PaX's principal form of memory protection won't actually be functional. This is a mistake that's commonly made due to the options not being visibly exposed to the user as a result of COMPAT_VDSO being enabled (and has been known to cause other problems - see bug 210138).
Comment 44 Wolfram Schlich (RETIRED) gentoo-dev 2008-02-22 12:42:21 UTC
Pipacs, is bug #210022 anyhow related to this issue here?!
Thanks!
Comment 45 Daniel Schröder 2008-02-22 12:59:27 UTC
(In reply to comment #43)
> (In reply to comment #42)
> > i will use the old .config with r7 and hopefully this fix the problem for me
> Are you sure you'd want to use the prior config? 

thanks for the info, but i have no choice...these are perimeter fws and randomly Oops-crashing is definitely not the intended behavior...
Comment 46 PaX Team 2008-02-23 18:56:19 UTC
(In reply to comment #44)
> Pipacs, is bug #210022 anyhow related to this issue here?!

no, it isn't.
Comment 47 PaX Team 2008-02-23 18:59:07 UTC
(In reply to comment #45)
> thanks for the info, but i have no choice...these are perimeter fws and
> randomly Oops-crashing is definitely not the intended behavior...

however this config diff suggests that it is a specific PaX feature that when enabled triggers this. it'd help me if we could find out which one it is. would it be possible to test just a few kernels with more and more PaX options enabled and determine at which point the problem pops up again?
Comment 48 Daniel Schröder 2008-02-26 00:24:20 UTC
Created attachment 144652 [details]
crash.config
Comment 49 Daniel Schröder 2008-02-26 00:28:52 UTC
(In reply to comment #47)
> (In reply to comment #45)
> > randomly Oops-crashing is definitely not the intended behavior...
> 
> it be possible to test just a few kernels with more and more PaX options
> enabled and determine at which point the problem pops up again?
i have no time to do this...but i have posted my crash .config...so somebody could take this .config, trigger a crash and remove some options from the diff post...

Comment 50 Brian Kroth 2008-03-12 19:55:22 UTC
(In reply to comment #49)
> > it be possible to test just a few kernels with more and more PaX options
> > enabled and determine at which point the problem pops up again?
> i have no time to do this...but i have posted my crash .config...so somebody
> could take this .config, trigger a crash and remove some options from the diff
> post...
> 

I have some time for testing ...

Building with hardened i686-pc-linux-gnu-3.4.6 on a vmware vm.  Testing just involves rebooting over and over again.

I haven't been able to reproduce it with 2.6.23-hardened-r8.  Someone let me know if that's good enough otherwise I can post some debug info and keep testing with -r7 to narrow it down to a particular feature.
Comment 51 Nick P 2008-03-19 16:26:47 UTC
I've been random getting hardlocks with .23-r7 and .23-r8.  I was previously using .22-r8 with the same config which worked fine, save the bad page state error bug in .22-rx.  My GCC is 3.4.6.  The box is remote, so I don't have a means to see the panic message.  Attaching my .config, System.map, and sched.o / memory.o as the others have done (for my -r8 crasher).
Comment 52 Nick P 2008-03-19 16:27:42 UTC
Created attachment 146591 [details]
System.map, .config, memory.o and sched.o
Comment 53 Brian Kroth 2008-03-19 17:00:02 UTC
(In reply to comment #51)
> I've been random getting hardlocks with .23-r7 and .23-r8.  I was previously
> using .22-r8 with the same config which worked fine, save the bad page state
> error bug in .22-rx.  My GCC is 3.4.6.  The box is remote, so I don't have a
> means to see the panic message.  Attaching my .config, System.map, and sched.o
> / memory.o as the others have done (for my -r8 crasher).
> 

Know of anyway to reproduce the panics?
Comment 54 Nick P 2008-03-19 17:02:59 UTC
(In reply to comment #53)
Unfortunately not, they happen at random intervals, in the past 24h, I've had one that ran for about 8 hours before crashing on me while I was asleep, and another that crashed within an hour of running.  It was the same with -r7, the crashes were unreproducible, but did always happen eventually.
Comment 55 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-05 19:23:45 UTC
Just happened to me on a system that was previously running 2.6.21-hardened.
Upgraded to 2.6.23-hardened-r9 and BAM! after 4 hours.
Will attach a screenshot from a remote console.

I wish this would be fixed anytime soon... :(
Comment 56 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-05 19:25:58 UTC
Created attachment 148780 [details]
Crash screenshot
Comment 57 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-05 22:47:30 UTC
Hmm, I tried 2.6.24.4 + grsecurity-2.1.11-2.6.24.4-200803262003.patch and
disabled MEMORY_SANITIZE + UDEREF.
This time the crash was without *any* message at all, the machine just
freezed hard :((
Back to 2.6.22-hardened-r8 for now. Sick of experiments, at least with
that machine :)
Comment 58 kfm 2008-04-06 20:51:02 UTC
Re Comment 57:

Wolfram, could you attach the System.map that was generated by that kernel? That way, some sense can be made of the call trace.
Comment 59 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-06 22:02:53 UTC
Created attachment 148926 [details]
System.map of crash-alpha.png

System.map of kernel from comment 57
(2.6.24.4 + grsecurity-2.1.11-2.6.24.4-200803262003.patch)
Comment 60 Gordon Malm (RETIRED) gentoo-dev 2008-04-07 00:43:18 UTC
Sorry to bother, but it is the System.map from the crashing 2.6.23-r9 kernel in comment #56 that is required.
Comment 61 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-07 03:40:33 UTC
Created attachment 148939 [details]
System.map of crash-alpha.png

(In reply to comment #60)
> Sorry to bother, but it is the System.map from the crashing 2.6.23-r9 kernel in
> comment #56 that is required.

Yeah, sorry, it was already late ;)
Comment 62 PaX Team 2008-04-08 01:58:54 UTC
(In reply to comment #57)
> Hmm, I tried 2.6.24.4 + grsecurity-2.1.11-2.6.24.4-200803262003.patch and
> disabled MEMORY_SANITIZE + UDEREF.

can you post your .config?
Comment 63 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-14 08:54:55 UTC
Created attachment 149650 [details]
.config of linux-2.6.24.4-grsec-2.1.11-200803262003

(In reply to comment #62)
> (In reply to comment #57)
> > Hmm, I tried 2.6.24.4 + grsecurity-2.1.11-2.6.24.4-200803262003.patch and
> > disabled MEMORY_SANITIZE + UDEREF.
> 
> can you post your .config?

Attached.
Comment 64 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-22 07:42:20 UTC
Dear Pipacs,
have you had any time to look at the .config I posted?
Thanks!
Comment 65 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-22 08:40:09 UTC
(In reply to comment #38)
> HIGHPTE was unset in kernelconfig of "io".
> I'm currently running 2.6.23-hardened-r4 since 20 days without any problems.
> 
> However @work i tried 2.6.23-hardened-r6 (with HIGHPTE) and those machines
> oopsed again.
> 
> You will find interesting that another guy @work is running 2.6.23-hardened-rx
> stable since _days_ with a slightly different kernelconfig (with HIGHPTE) on
> almost the same hardware (mostly dell machines). I'm currently checking
> differences and going to try it on some of my "test" machines.

Georg, any news on this one?
Comment 66 Wolfram Schlich (RETIRED) gentoo-dev 2008-04-27 15:46:50 UTC
Ok, today I did have this error with 2.6.22-hardened-r8:

2008-04-27 14:12:00 +02:00; alpha; kern.err; kernel: BUG: scheduling while atomic: mysqld/0x00000001/24249

I've now upgraded to linux-2.6.24.5-grsec-2.1.11-200804211829
with SMP, HT, HIGHPTE and PAGEEXEC disabled and SEGMEXEC enabled.

Let's see what'll happen...
Comment 67 Wolfram Schlich (RETIRED) gentoo-dev 2008-05-05 12:12:23 UTC
(In reply to comment #66)
> I've now upgraded to linux-2.6.24.5-grsec-2.1.11-200804211829
> with SMP, HT, HIGHPTE and PAGEEXEC disabled and SEGMEXEC enabled.

That kernel is now running stable for almost 8 days...
Comment 68 Daniel Schröder 2008-05-10 11:12:27 UTC
(In reply to comment #42)
i am preparing four systems with this crash.config and -r11. I hope they do not crash :) 
Comment 69 Daniel Schröder 2008-05-10 12:02:56 UTC
Created attachment 152735 [details]
r11 crash
Comment 70 Daniel Schröder 2008-05-10 12:04:54 UTC
(In reply to comment #68)
> Created an attachment (id=152735) [edit]
> r11 crash
> 
in a couple of minutes one of two systems crashed...going to 2.6.24-hardened-r1 with same config...

Comment 71 Gordon Malm (RETIRED) gentoo-dev 2008-05-11 08:18:39 UTC
Could you try with 2.6.24-r2?  Please turn off CONFIG_GRKERNSEC_HIDESYM and use vga=ext boot parameter.  This helps produce useful information.  Thanks!
Comment 72 Daniel Schröder 2008-05-11 08:30:05 UTC
Created attachment 152821 [details]
2.6.24-r1 crash
Comment 73 Daniel Schröder 2008-05-11 08:31:01 UTC
Created attachment 152823 [details]
2.6.24-r1 crash

crash with "pax suspicious general protection fault"
Comment 74 Daniel Schröder 2008-05-11 08:33:38 UTC
(In reply to comment #70)
> in a couple of minutes one of two systems crashed...going to 2.6.24-hardened-r1
> with same config...
during the night, two of four machines crashed with r2 

Comment 75 Daniel Schröder 2008-05-11 09:38:09 UTC
(In reply to comment #71)
> Could you try with 2.6.24-r2?  Please turn off CONFIG_GRKERNSEC_HIDESYM and use
> vga=ext boot parameter.  This helps produce useful information.  Thanks!
done. four systems  are running r2, vga=ext and no hidesym...

Comment 76 Daniel Schröder 2008-05-11 19:31:37 UTC
(In reply to comment #74)

> during the night, two of four machines crashed with r2 

arghh...big mistake...the machines crashed with 2.6.24-hardened-r1...
r2s are running since 10hours without a crash...hopefully this bug is gone...

Comment 77 Daniel Schröder 2008-05-12 08:17:01 UTC
(In reply to comment #75)
> (In reply to comment #71)
> > Could you try with 2.6.24-r2?  Please turn off CONFIG_GRKERNSEC_HIDESYM and use
> > vga=ext boot parameter.  This helps produce useful information.  Thanks!
two of four systems crashed. here the pictures...http://dschroeder.info/system1.jpeg http://dschroeder.info/system2.jpeg

Comment 78 Daniel Schröder 2008-05-12 12:21:06 UTC
Created attachment 152939 [details]
crash probably triggered by slocate run
Comment 79 Daniel Schröder 2008-05-12 12:24:05 UTC
(In reply to comment #78)
> Created an attachment (id=152939) [edit]
> crash probably triggered by slocate run
> 
because of the small pictures i have configured netconsole and tried to trigger a crash with applications that runs during the night, and cron.daily slocate crashed the system :)
Comment 80 Daniel Schröder 2008-05-12 14:18:41 UTC
O.K. with the CONFIG_PAX_KERNEXEC disabled, slocate finishes. If i enable this option the system crashes within 100 icmp requests..i am running now two systems with r2 and PAX_KERNEXEC disabled...
Comment 81 Daniel Schröder 2008-05-12 14:19:21 UTC
Created attachment 152947 [details]
netconsole log from slocate testruns
Comment 82 Georg Weiss 2008-05-14 15:28:52 UTC
We are running 2.6.24-hardened-r1 on some machines now ( > 6 days).

kernel config comes with(out)...
--8<--
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
# CONFIG_HIGHPTE is not set

#
# Non-executable pages
#                                                                               
CONFIG_PAX_NOEXEC=y
# CONFIG_PAX_PAGEEXEC is not set
CONFIG_PAX_SEGMEXEC=y
CONFIG_PAX_EMUTRAMP=y
CONFIG_PAX_MPROTECT=y
CONFIG_PAX_NOELFRELOCS=y
CONFIG_PAX_KERNEXEC=y 

#
# Address Space Layout Randomization
#
CONFIG_PAX_ASLR=y
CONFIG_PAX_RANDKSTACK=y
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y                                                           
--8<--

I'm going to deploy (r2) this on some more important machines now...
I will do another test with HIGHPTE and SMT enabled.
Comment 83 PaX Team 2008-05-16 12:41:59 UTC
(In reply to comment #81)
> Created an attachment (id=152947) [edit]
> netconsole log from slocate testruns

thanks for these logs, but now i'm even more confused ;). these crashes don't look at all as the ones before, they happen in a function that seemingly has a potential null-deref bug, except i don't know how on earth noone hit it before.
Comment 84 Daniel Schröder 2008-05-16 15:49:25 UTC
> netconsole log from slocate testruns
> 
> thanks for these logs, but now i'm even more confused ;). these crashes don't
> look at all as the ones before, they happen in a function that seemingly has a
> potential null-deref bug, except i don't know how on earth noone hit it before.
the systems are running r2 with PAX_KERNEXEC disabled just fine....no probs since nearly five days.. 

Comment 85 Krzysztof Pawlik (RETIRED) gentoo-dev 2008-06-22 01:37:45 UTC
Seems to work fine so far;

edge ~ # zgrep PAX /proc/config.gz
CONFIG_PAX=y
# CONFIG_PAX_SOFTMODE is not set
CONFIG_PAX_EI_PAX=y
CONFIG_PAX_PT_PAX_FLAGS=y
CONFIG_PAX_NO_ACL_FLAGS=y
# CONFIG_PAX_HAVE_ACL_FLAGS is not set
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
CONFIG_PAX_PAGEEXEC=y
CONFIG_PAX_SEGMEXEC=y
CONFIG_PAX_EMUTRAMP=y
CONFIG_PAX_MPROTECT=y
# CONFIG_PAX_NOELFRELOCS is not set
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_ASLR=y
CONFIG_PAX_RANDKSTACK=y
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
CONFIG_PAX_MEMORY_UDEREF=y
edge ~ # uname -a
Linux edge 2.6.25-hardened #1 SMP Sun Jun 22 04:00:00 CEST 2008 i686 Pentium III (Coppermine) GenuineIntel GNU/Linux
edge ~ #
Comment 86 Georg Weiss 2008-07-09 15:54:45 UTC
I just wanted to report that we are running 2.6.24-hardened-r2 on the systems now.   Pretty stable so far... > 20 days.

extracts from the config...
--8<--
CONFIG_PAX=y
# CONFIG_PAX_SOFTMODE is not set
# CONFIG_PAX_EI_PAX is not set
CONFIG_PAX_PT_PAX_FLAGS=y
# CONFIG_PAX_NO_ACL_FLAGS is not set
CONFIG_PAX_HAVE_ACL_FLAGS=y
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
# CONFIG_PAX_PAGEEXEC is not set
CONFIG_PAX_SEGMEXEC=y
CONFIG_PAX_EMUTRAMP=y
CONFIG_PAX_MPROTECT=y
CONFIG_PAX_NOELFRELOCS=y
CONFIG_PAX_KERNEXEC=y
CONFIG_PAX_ASLR=y
CONFIG_PAX_RANDKSTACK=y
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
CONFIG_PAX_MEMORY_UDEREF=y

CONFIG_HIGHPTE=y

CONFIG_SCHED_SMT=y
--8<--
Comment 87 Nick P 2008-07-17 21:13:55 UTC
2.6.25-r2 stable 8+ days for my system

CONFIG_PAX=y
# CONFIG_PAX_SOFTMODE is not set
# CONFIG_PAX_EI_PAX is not set
CONFIG_PAX_PT_PAX_FLAGS=y
# CONFIG_PAX_NO_ACL_FLAGS is not set
CONFIG_PAX_HAVE_ACL_FLAGS=y
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
CONFIG_PAX_NOEXEC=y
# CONFIG_PAX_PAGEEXEC is not set
CONFIG_PAX_SEGMEXEC=y
# CONFIG_PAX_EMUTRAMP is not set
CONFIG_PAX_MPROTECT=y
CONFIG_PAX_NOELFRELOCS=y
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_ASLR=y
CONFIG_PAX_RANDKSTACK=y
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
CONFIG_PAX_MEMORY_UDEREF=y
Comment 88 solar (RETIRED) gentoo-dev 2008-07-17 21:28:43 UTC
23 is no longer current. Closing bug. file new ones for new kernels.
Comment 89 kfm 2009-07-20 18:32:51 UTC
Setting status to CANTFIX as it references a rather old version of hardened-sources which has since been retired. If the problem arises again with a current version of hardened-sources, please re-open the bug and update the summary accordingly. Thanks.