While investigating this bug: http://bugs.gentoo.org/show_bug.cgi?id=197191 I happened to test the example given there under gdb. After the SIGBUS I went up a couple of frames into the main function, then did: print ((char *) file)[8192] Which proceeded to lock up my machine. Actually, it looks like gdb just sits spinning in a *really* tight loop. Kill -9 on the test program being debugged, gdb and/or both together doesn't kill it. SysRq-T doesn't show any stack trace for gdb although all other tasks are shown fine, including the traced test program. Both 2.6.23.1 and 2.6.24-rc1-g82798a17 show the same behaviour.
Created attachment 134669 [details] dmesg dmesg from latest nightly git snapshot showing the bug. Note the three SysRq-T traces. The first was before the invalid memory access. The second after the access but before the tasks were killed. The last after sending kill -9 to both gdb and the test program.
Created attachment 134671 [details] .config .config (for 2.6.23.1)
Just tested on 2.6.22.1 where this does NOT occur (gdb happily prints '\0'). So it looks like a regression. I'll start bisecting tomorrow.
I've bisected it down to this commit: 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 "mm: merge populate and nopage into fault (fixes nonlinear)" I've reported it upstream to LKML and Nick Piggin, the author of the commit.
A fix has been committed to Linus' tree with commit: 5307cc1aa53850f017c8053db034cf950b670ac9 The same patch has been queued for the 2.6.23-stable tree.
Thanks a lot for digging into this and the other bug report, will include that patch with the next revision. and yes, that bug report was very well worded, congratulations on the best-bug-report-of-all-time award :)
Thanks! I shall treasure it always ;)
this was fixed in gentoo-sources-2.6.23-r2