Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 328889 - sys-kernel/gentoo-sources-2.6.34-r1: random system crashes
Summary: sys-kernel/gentoo-sources-2.6.34-r1: random system crashes
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Xen Devs
URL:
Whiteboard:
Keywords:
: 335574 (view as bug list)
Depends on: 317231
Blocks:
  Show dependency tree
 
Reported: 2010-07-19 01:58 UTC by OnlyTux
Modified: 2013-07-26 19:39 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg (dmesg.txt,57.66 KB, text/plain)
2010-07-19 02:01 UTC, OnlyTux
Details
emerge --info (emerge-info.txt,3.95 KB, text/plain)
2010-07-19 02:01 UTC, OnlyTux
Details
lspci -vvnn (lspci.txt,25.49 KB, text/plain)
2010-07-19 02:01 UTC, OnlyTux
Details
/var/log/Xorg.0.log (xorg.log,15.68 KB, text/plain)
2010-07-19 02:04 UTC, OnlyTux
Details
/var/log/messages (var-log-messages5.txt,1.37 MB, text/plain)
2010-07-19 15:20 UTC, OnlyTux
Details
.config (.config,68.46 KB, text/plain)
2010-07-19 18:06 UTC, OnlyTux
Details
/tmp/xinitrc_dhp.2010-09-14_13-16-23.log (xinitrc_dhp.2010-09-14_13-16-23.log,71.73 KB, text/plain)
2010-09-14 18:47 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/tmp/xinitrc_dhp.2010-09-14_12-37-24.log (xinitrc_dhp.2010-09-14_12-37-24.log,92.93 KB, text/plain)
2010-09-14 18:59 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/tmp/messages (messages,13.07 KB, text/plain)
2010-09-14 19:01 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/tmp/messages (messages,12.79 KB, text/plain)
2010-09-14 19:03 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
some dmesg (dmesg,68.57 KB, text/plain)
2010-09-22 12:33 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/tmp/messages (T,7.58 KB, text/plain)
2010-10-10 00:54 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
cat /var/log/messages | grep BUG | uniq >list (list,8.32 KB, text/plain)
2010-10-10 01:00 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/tmp/messages (messages,26.55 KB, text/plain)
2010-10-10 01:34 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
dmesg output on crash (dmesg_-w,24.00 KB, text/plain)
2013-07-26 18:39 UTC, Drake Wyrm
Details

Note You need to log in before you can comment on or make changes to this bug.
Description OnlyTux 2010-07-19 01:58:09 UTC
Stable x86 desktop profile running Xfce. After upgrading gentoo-sources to version 2.6.34-r1 I am experiencing random freezes. This is the 3rd crash in 6 days.

At first I thought that this bug could be related to bug #327831, but I never experienced this problem with kernel versions <=2.6.32-r7.
Then I saw bug #323787, but the reporter is using unstable profile, so I am not sure that is exactly the same problem. My apologies in case this is a duplicate.

I am going to attach any log files I could, but since this is my 2nd bug report ever and I am not at all skilled in such problems, please tell me and I will be glad to provide any other informations.

Reproducible: Always





At first I thought my problem could be related to bug #327831,
Comment 1 OnlyTux 2010-07-19 02:01:11 UTC
Created attachment 239323 [details]
dmesg

dmesg caught from ssh when the system was hung
Comment 2 OnlyTux 2010-07-19 02:01:28 UTC
Created attachment 239325 [details]
emerge --info
Comment 3 OnlyTux 2010-07-19 02:01:41 UTC
Created attachment 239327 [details]
lspci -vvnn
Comment 4 OnlyTux 2010-07-19 02:04:31 UTC
Created attachment 239329 [details]
/var/log/Xorg.0.log
Comment 5 OnlyTux 2010-07-19 02:11:16 UTC
My /var/log/messages: http://pastebin.com/6STyEcLb

An error is shown each time X has crashed (three times up to now).
Comment 6 Jeroen Roovers (RETIRED) gentoo-dev 2010-07-19 14:58:47 UTC
(In reply to comment #5)
> My /var/log/messages: http://pastebin.com/6STyEcLb
> 
> An error is shown each time X has crashed (three times up to now).

Please attach that file.
Comment 7 OnlyTux 2010-07-19 15:20:31 UTC
Created attachment 239391 [details]
/var/log/messages

Sorry for not having attached it before, actually I tried but bugzilla complained since it was more than 2Mb of size. In this file the records about the days between 14 and 16 Jul have been erased since no errors happened during that period.
Comment 8 Jeroen Roovers (RETIRED) gentoo-dev 2010-07-19 17:17:56 UTC
(In reply to comment #7)
> Created an attachment (id=239391) [details]
> /var/log/messages
> 
> Sorry for not having attached it before, actually I tried but bugzilla
> complained since it was more than 2Mb of size.

It's OK to attach compressed files. 8-)
Comment 9 OnlyTux 2010-07-19 18:06:12 UTC
Created attachment 239417 [details]
.config
Comment 10 Mike Pagano gentoo-dev 2010-08-05 13:22:28 UTC
Are you using xf86-video-intel? What version?  Is this the same issue as bug #301282 ?
Comment 11 OnlyTux 2010-08-06 14:11:34 UTC
(In reply to comment #10)

Yes, I am using the unmasked version of xf86-video-intel:
$ emerge -pv xf86-video-intel
[ebuild   R   ] x11-drivers/xf86-video-intel-2.9.1  USE="dri -debug" 0 kB

As for the bug#301282, there are evident similarities, but I am not sure that is exactly the same issue:
- I never experienced this problem with kernel <=2.6.32-r7;
- there are no visible errors in my Xorg.0.log, if I am not mistaken. The only lines containing errors I have seen are in /var/log/messages;
- unfortunately I never enabled the Hangcheck Timer in the kernel, sorry for my ignorance, I am doing it now.

I have read documentation about Kdump, so if I manage to get one, could a core dump be of any help?

By the way, I have been using the 2.6.35 vanilla-sources since August, 2nd. I have not experienced any problems since then, so *maybe* it has been resolved upstream.

So, what is the best way I can help, in your opinion? Must I emerge the 2.6.35 gentoo-sources, go back to 2.6.34 and get a proper core dump, momentarily stick to 2.6.35 vanilla and see what happens and/or try different combinations of kernel and intel driver, or anything else?
Comment 12 Mike Pagano gentoo-dev 2010-08-25 01:16:34 UTC
How has 2.6.35 been treating you?
Comment 13 OnlyTux 2010-08-25 11:15:15 UTC
In three weeks with 2.6.35 vanilla kernel I had no crashes at all, so whatever caused this issue has been fixed upstream.
I am compiling gentoo-sources-2.6.35-r4 right now.
I will give the results in a few days to make sure that Gentoo's official version is error-free, too.
Comment 14 OnlyTux 2010-09-02 11:22:18 UTC
After a week with gentoo-sources 2.6.35-r4 I can say with safety that this issue has been fixed upstream.

Can I set the state of this bug to RESOLVED UPSTREAM?
Comment 15 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-13 07:51:29 UTC
Bugs 334143 335574 should be marked dup of this one. This bug must be marked blocking 317231 .
Comment 16 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-13 08:01:28 UTC
I am using sys-kernel/xen-sources-2.6.34-r3 (from 2.6.34) . And i have both bug 331245 and bug 335574 . Note that, as Xen user, i can not update to 2.6.35. 

When I am using my desktop, it happens i have random crashes and freeses:
- standard freese (instant and total freese: no more answer to ping)
- I/O freese (sounds keeps on, network keeps on, but no keyboard, no X, no mouse, no more blink on numlock, no effect of WM shortkeys)
- X crash, back to console (working console)
- kernel freese after some specific commands, such as the well known "sync"
- random app crash: thunderbird, firefox, pidgin, gkrellm ... randomly die. Usually, when any of them died, I can restart them, except pidgin. As if pidgin was more fragile, or sensible.

When i read logs, I have various messages (all listed in the forum; which is down ATM http://forums.gentoo.org/viewtopic-p-6348730.html ). As for other people, it all started for me when I updated to 2.6.34. Some times, I have no particular message in logs. Neither X logs, or system logs. Just app crash.
Comment 17 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-13 15:53:40 UTC
Question to every one, including reporters of bug 334143 bug 335574 bug 331245 :
- are we all using ATI Radeon ?
- are we all using X 1.7 or above ? 1.8 for me
- are we all using X at the time of the crash ?
- did we all activate KMS, VGA-ARB, FrameBuffer, *and* DRM/DRI ?

For me, it happens that the kernel lock occurs when i type "sync", in console, after X crashed ... so, the KP can happen when X is not running, after a dead X had damaged my system.

OnlyTux, you seem to have an Intel VGA; but I and an other reporter have Radeon. I have not seen Nvidia around this bug yet.

All of us having the crash while X is running ? maybe the new X 1.7 is responsible for sending a bad request to kernel ? any one had this crash while in console, and X *not* running ?

Maybe it's module dependant ? I needed 2.6.34 to have KMS+VGAarb+FB+DRI/DRM ... to be able to use X with all my 3 video cards; none of the previous kernel could make them work TOGETHER. Maybe the faulty bit of code is in the NEW part I need and use ?

Here is an example of message i may have, when Firefox get killed, but X remains "usable" (black lines and garbage around; all text shown in unreadable font, but, mouse keeps moving, and, can finish typing email, save draft, and reboot as nicely as possible). Sometimes, I get different messages:

Jul 12 04:13:47 uranus kernel: [TTM] Error restricting pfn 32def: -12
Jul 12 04:13:47 uranus kernel: [TTM] Error restricting pfn 32df0: -12
Jul 12 04:13:47 uranus kernel: PCI-DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:01:05.0
Jul 12 04:13:47 uranus kernel: [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 1280 pages at 0x10642000
Jul 12 04:13:47 uranus kernel: [TTM] Couldn't bind backend.
Jul 12 04:13:47 uranus kernel: [TTM] Buffer eviction failed
Jul 12 04:13:47 uranus kernel: radeon 0000:01:05.0: object_init failed for (4001792, 0x00000004)
Jul 12 04:13:47 uranus kernel: [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4001792, 4, 4096, -12)
Jul 12 04:13:47 uranus kernel: [TTM] Error restricting pfn 33083: -12
Jul 12 04:13:47 uranus kernel: [TTM] Error restricting pfn 33084: -12 
Comment 18 Mikayel Grigoryan 2010-09-13 19:51:19 UTC
*** Bug 335574 has been marked as a duplicate of this bug. ***
Comment 19 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 18:41:45 UTC
Logs of the crash i just had:
/var/log/messages:

Sep 14 20:25:01 uranus cron[18262]: (munin) CMD ([ -x /usr/bin/munin-cron ] && /usr/bin/munin-cron)
Sep 14 20:25:50 uranus kernel: pulseaudio[26260] general protection ip:7f55c8f9d34c sp:7fffd3658940 error:0 in libpulse.so.0.12.2[7f55c8f
7f000+40000]

.xinitrc_dhp.2010-09-14_18-21-38.log which is the output of xinitrc 2>&1

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x466898]
1: /usr/bin/X (0x400000+0x608e9) [0x4608e9]
2: /lib/libpthread.so.0 (0x7fdb72c51000+0xf010) [0x7fdb72c60010]
3: /usr/lib/libdrm_radeon.so.1 (radeon_bo_unmap+0x0) [0x7fdb6f936230]
4: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7fdb6fb3a000+0xa4645) [0x7fdb6fbde645]
5: /usr/lib/xorg/modules/libexa.so (0x7fdb6f4fd000+0x115ad) [0x7fdb6f50e5ad]
6: /usr/lib/xorg/modules/libexa.so (0x7fdb6f4fd000+0x7c02) [0x7fdb6f504c02]
7: /usr/bin/X (0x400000+0xbda1a) [0x4bda1a]
8: /usr/bin/X (0x400000+0x3d679) [0x43d679]
9: /usr/bin/X (0x400000+0xa1e47) [0x4a1e47]
10: /usr/bin/X (0x400000+0x3f954) [0x43f954]
11: /usr/bin/X (0x400000+0x25e0b) [0x425e0b]
12: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fdb71be8bbd]
13: /usr/bin/X (0x400000+0x259b9) [0x4259b9]
Segmentation fault at address (nil)
Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
Please also check the log file at "/var/log/Xorg.0.log" for additional information.
<<<< Enlightenment Error >>>>

(my WM is E17).

The crash happened just after I changed virtual desktop, from an empty desktop, coming back to the desktop where Firefox was expanding, showing a single picture (not a web page, just a picture). I saw FF decoration, and the picture inside, them boum. Dispite the message, i was not playing sound; maybe the message is about Firefox closing, thus releasing all audio ressources required when i was playing some movie a few minutes before. After the crash, X terminated, and all monitors turned down; like in DPMS mode. I did not come back to console; no prompt; and numlock stopped working. Still, I could press the power button, and the computer turned off gracefully.

...

The crash before: /var/log/messages
Sep 14 18:19:01 uranus cron[19204]: (root) CMD (. /etc/conf_local ; ups_refresh)
Sep 14 18:21:07 uranus syslog-ng[3612]: syslog-ng starting up; version='3.1.2'

so, nothing much to say: all X frose; mouse stopped moving, picture were still shown by monitors, but nothing was moving; in particular Gkrellm. I will attach the X log soon: /tmp/xinitrc_dhp.2010-09-14_13-16-23.log
Comment 20 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 18:47:48 UTC
Created attachment 247322 [details]
/tmp/xinitrc_dhp.2010-09-14_13-16-23.log

In here, you see the X logs. The first lines are normal activity (UPS surveillor generates 1 lines per minute). EEEK lines are Enligntenment warnings. Xlib warning is also a normal message (since I use Xinerama, Randr is disabled and unsupported; and various libs keep reminding this fact very often).

A few seconds before system freese (see previous comment), I had a Firefox death. The "Power seems good" will give you an idea of timings. The box crashed less than 2 minutes after Firefox crash. Attached file is the end of log (there are suppressed lines before, not after).

I also have crash when firefox is NOT launched, but just playing a movie with mplayer: in the middle of the movie, while not touching the keyboard or mouse, some times ... boum.
Comment 21 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 18:59:40 UTC
Created attachment 247325 [details]
/tmp/xinitrc_dhp.2010-09-14_12-37-24.log

A bit more interesting: the crash at 13:14; we start with /tmp/xinitrc_dhp.2010-09-14_12-37-24.log ... then /var/log/messages with more interesting lines : /tmp/messages
Comment 22 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 19:01:02 UTC
Created attachment 247327 [details]
/tmp/messages
Comment 23 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 19:03:01 UTC
Created attachment 247329 [details]
/tmp/messages
Comment 24 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-14 19:13:05 UTC
mpagano : how is this Xen related ? 3 people out of 4 are *NOT* using Xen, and still repro various forms of all the bugs I am trying to log.
Comment 25 Mikayel Grigoryan 2010-09-15 08:25:16 UTC
- are we all using ATI Radeon ?
- are we all using X 1.7 or above ? 1.8 for me
- are we all using X at the time of the crash ?
- did we all activate KMS, VGA-ARB, FrameBuffer, *and* DRM/DRI ?

Yes. For all questions. Have a feeling that it has someting to do with network activity, all crashes relate to network traffic: Thunderbird startup freeze, rich web pages on Chromium freeze etc.
Comment 26 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-15 11:34:10 UTC
I thought one of us had an Intel video card ...

I don't think it's related to network activity: i had crashes when switching desktop and going to Firefox that that had already loaded a page, or a picture. Firefox did not neet to refresh page; just send rendering output to X. Same with Thunderbird. I also had crashes when playing a movie, a local file.

BUT, the network card is on the PCI bus; so, *if* the issue is not in the VGA or DRM part of kernel, but in the memory allocator, or KMS, or the arbitrer, then it could explain that any PCI card can trigger the bug; but, the video card being the one that most usses the bus, is the one that most likely will trigger. I also had crashes when doing heavy download; download that did not needed to refresh any web page, but that needed many network transactions.

So, my conclusion for now is that it's something new to the 2.6.34, that could affect both network and video cards. To me, KMS or VGA or PCI arbitrer. I really don't know how to understand the error in #17. It makes me think it could be a memory allocator issue (or even deeper: co-cpu manager, TLB manager).
Comment 27 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-15 11:36:53 UTC
Do other distros have issue with 2.6.34 ? Does it happen with vanilla kernel, or only Gentoo ones ? we need to determine if it's Gentoo specific, or if we need to send reports to upstream.
Comment 28 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-15 12:36:16 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=28402
Been said it could be our problem. Been said to use the patch by airlies. Been said this patch works for r300 series, and this serie only. Been said the LKML have been informed of the issue.

Maybe Gentoo can add this patch for r300 users.

As my HD4350 are r700, this patch should not fix it for me.
Comment 29 Mikayel Grigoryan 2010-09-15 19:53:31 UTC
True. Happens not only in network usage activities. I also had crashes when switching windows between mplayer and gnome-terminal or whatever X app. Also had crash while downloading files via wget.
Also had crashes with vanilla kernel 2.6.35.4, so does not look like problem is fixed in that version, though I had impression that crashes were less frequent.
Comment 30 DEMAINE Benoît-Pierre, aka DoubleHP 2010-09-22 12:33:01 UTC
Created attachment 248351 [details]
some dmesg

When i see how many different messages can be produced, I more and more tend to think there is a mad pointer in the core kernel. I have had so many kinds of crash, you can't imagine:
- closed apps
- killed X
- freeses
- hard reboot (once)
- failing on sync
- corrupted FS (the owner group of several files in /etc moved from root=0 to lp=17)
- refusing to sync (sync command either never returning, or, producing freese, or producing hard reboot)
- garbage on screen

The common point is NOT X. The common point is the kernel. And very few things in moderne kernels can affect so many different "drivers". And it's not my Xen, or Xen hypervisor, because my Xen is in /boot since january, while problems started in june. Very few lines of kernel code have ability, and rights to alter so many different drivers. So i think about a mad pointer, randomly destroying the RAM.
Comment 31 DEMAINE Benoît-Pierre, aka DoubleHP 2010-10-10 00:54:01 UTC
Created attachment 250061 [details]
/tmp/messages

Those are the two most frequent things i see in my logs ... and ...
Comment 32 DEMAINE Benoît-Pierre, aka DoubleHP 2010-10-10 01:00:59 UTC
Created attachment 250063 [details]
cat /var/log/messages | grep BUG | uniq >list

... it seems to happen, most of the time, just a few seconds after Munin execution (a daemon that is executed every 5 minutes by cron).

At least, all the
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
lines, do exactly match this !
Comment 33 DEMAINE Benoît-Pierre, aka DoubleHP 2010-10-10 01:34:30 UTC
Created attachment 250065 [details]
/tmp/messages

a few minutes later ... after 3:10:16, all X was frosen, but music was still playing. I pressed the power button about 3:10:25. Then, music stopped, and HDD started working for a few seconds. Then, HDD LED blinked once every 5s. But box never came down. Around 3:11:30 i pressed reset.

Again, it happened at 3:10 : just after Munin went around ...
Comment 34 Mikayel Grigoryan 2010-10-15 19:35:44 UTC
Had "modprobe -v drm debug=1" turned on during last 2 crashes and in both cases last debug line was:

timestamp hostname kernel: [drm:drm_ioctl], pid=4170, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1
Comment 35 DEMAINE Benoît-Pierre, aka DoubleHP 2010-10-16 14:10:31 UTC
I have moved from vmlinuz-2.6.34-xen-Gentoo-uranus-1-37 to vmlinuz-2.6.35-gentoo-r10-Gentoo-uranus-1-39 . I did not change any thing in the conf, just imported the old config, so, in short, the diff between the two conf almost is about XEN removal. I don't have anymore crash. But my system is getting slow. Window content refresh is way slower than before.

And I have this thing in syslog, every 10s:

Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 80
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Oct 16 16:05:11 uranus kernel: <3>50 ff ff ff ff ff ff 00 24 4d 15 25 01 01 01 01  P.......$M.%....
Oct 16 16:05:11 uranus kernel: <3>2b 0a 01 02 6f 1f 17 96 ea 4e ec a1 57 4c 99 23  +...o....N..WL.#
Oct 16 16:05:11 uranus kernel: <3>19 51 57 bf ee 01 31 59 45 59 61 59 01 01 01 01  .QW...1YEYaY....
Oct 16 16:05:11 uranus kernel: <3>01 01 01 01 01 01 64 19 00 40 41 00 26 30 18 88  ......d..@A.&0..
Oct 16 16:05:11 uranus kernel: <3>36 00 33 e6 10 00 00 18 00 00 00 fc 00 49 42 4d  6.3..........IBM
Oct 16 16:05:11 uranus kernel: <3>20 54 35 36 41 0a 20 20 20 20 00 00 00 fe 00 54   T56A.    .....T
Oct 16 16:05:11 uranus kernel: <3>46 54 20 4d 6f 6e 69 74 6f 72 0a 20 00 00 00 ff  FT Monitor. ....
Oct 16 16:05:11 uranus kernel: <3>00 36 36 2d 33 37 32 36 31 0a 20 20 20 20 00 39  .66-37261.    .9
Oct 16 16:05:11 uranus kernel:
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 80
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Oct 16 16:05:11 uranus kernel: <3>50 ff ff ff ff ff ff 00 24 4d 15 25 01 01 01 01  P.......$M.%....
Oct 16 16:05:11 uranus kernel: <3>2b 0a 01 02 6f 1f 17 96 ea 4e ec a1 57 4c 99 23  +...o....N..WL.#
Oct 16 16:05:11 uranus kernel: <3>19 51 57 bf ee 01 31 59 45 59 61 59 01 01 01 01  .QW...1YEYaY....
Oct 16 16:05:11 uranus kernel: <3>01 01 01 01 01 01 64 19 00 40 41 00 26 30 18 88  ......d..@A.&0..
Oct 16 16:05:11 uranus kernel: <3>36 00 33 e6 10 00 00 18 00 00 00 fc 00 49 42 4d  6.3..........IBM
Oct 16 16:05:11 uranus kernel: <3>20 54 35 36 41 0a 20 20 20 20 00 00 00 fe 00 54   T56A.    .....T
Oct 16 16:05:11 uranus kernel: <3>46 54 20 4d 6f 6e 69 74 6f 72 0a 20 00 00 00 ff  FT Monitor. ....
Oct 16 16:05:11 uranus kernel: <3>00 36 36 2d 33 37 32 36 31 0a 20 20 20 20 00 39  .66-37261.    .9
Oct 16 16:05:11 uranus kernel:
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 80
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Oct 16 16:05:11 uranus kernel: <3>50 ff ff ff ff ff ff 00 24 4d 15 25 01 01 01 01  P.......$M.%....
Oct 16 16:05:11 uranus kernel: <3>2b 0a 01 02 6f 1f 17 96 ea 4e ec a1 57 4c 99 23  +...o....N..WL.#
Oct 16 16:05:11 uranus kernel: <3>19 51 57 bf ee 01 31 59 45 59 61 59 01 01 01 01  .QW...1YEYaY....
Oct 16 16:05:11 uranus kernel: <3>01 01 01 01 01 01 64 19 00 40 41 00 26 30 18 88  ......d..@A.&0..
Oct 16 16:05:11 uranus kernel: <3>36 00 33 e6 10 00 00 18 00 00 00 fc 00 49 42 4d  6.3..........IBM
Oct 16 16:05:11 uranus kernel: <3>20 54 35 36 41 0a 20 20 20 20 00 00 00 fe 00 54   T56A.    .....T
Oct 16 16:05:11 uranus kernel: <3>46 54 20 4d 6f 6e 69 74 6f 72 0a 20 00 00 00 ff  FT Monitor. ....
Oct 16 16:05:11 uranus kernel: <3>00 36 36 2d 33 37 32 36 31 0a 20 20 20 20 00 39  .66-37261.    .9
Oct 16 16:05:11 uranus kernel:
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 80
Oct 16 16:05:11 uranus kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Oct 16 16:05:11 uranus kernel: <3>50 ff ff ff ff ff ff 00 24 4d 15 25 01 01 01 01  P.......$M.%....
Oct 16 16:05:11 uranus kernel: <3>2b 0a 01 02 6f 1f 17 96 ea 4e ec a1 57 4c 99 23  +...o....N..WL.#
Oct 16 16:05:11 uranus kernel: <3>19 51 57 bf ee 01 31 59 45 59 61 59 01 01 01 01  .QW...1YEYaY....
Oct 16 16:05:11 uranus kernel: <3>01 01 01 01 01 01 64 19 00 40 41 00 26 30 18 88  ......d..@A.&0..
Oct 16 16:05:11 uranus kernel: <3>36 00 33 e6 10 00 00 18 00 00 00 fc 00 49 42 4d  6.3..........IBM
Oct 16 16:05:11 uranus kernel: <3>20 54 35 36 41 0a 20 20 20 20 00 00 00 fe 00 54   T56A.    .....T
Oct 16 16:05:11 uranus kernel: <3>46 54 20 4d 6f 6e 69 74 6f 72 0a 20 00 00 00 ff  FT Monitor. ....
Oct 16 16:05:11 uranus kernel: <3>00 36 36 2d 33 37 32 36 31 0a 20 20 20 20 00 39  .66-37261.    .9
Oct 16 16:05:11 uranus kernel:
Oct 16 16:05:11 uranus kernel: radeon 0000:03:00.0: DVI-I-2: EDID block 0 invalid.
Oct 16 16:05:11 uranus kernel: [drm:radeon_dvi_detect] *ERROR* DVI-I-2: probed a monitor but no|invalid EDID

I did not used to have this with my previous kernel.

As it is telling about EDID, it may look harmless. It is not. It's complaining about only one DVI plug, while, I have 3 ones, and all 3 are using the same kind of hardware. The IBM monitor is switched off, so it could hardly send EDID frames.

And with time flowing, afte a few hours, some problems happen like before: garbage on screen, rectangles that are not rfreshed (grey, black, or green content instead of content of window).

Comment 36 DEMAINE Benoît-Pierre, aka DoubleHP 2010-10-16 17:38:57 UTC
(In reply to comment #34)
> Had "modprobe -v drm debug=1" turned on during last 2 crashes and in both cases
> last debug line was:
> 
> timestamp hostname kernel: [drm:drm_ioctl], pid=4170, cmd=0xc01c64a3, nr=0xa3,
> dev 0xe200, auth=1
> 

Even if your / was mounted with sync mode, this would not be relevant. I am now 99% convinced that the issue is in the Virtual Machine, and it can corrupt anything at any place ... including filesystems. Amongst other things, the issue can happen, and freese the kernel before:
- message was sent by kernel to syslog
- message is recorded by syslog in the log file
- log file is send to disk for synchronisation

It is obvious to me that the crash can happen in a single scheduler cycle, so that "after issue happens", the disk driver itself is not called anymore. I am convinced that even sync mode would not help getting the full traces of the problem. Or not always.
Comment 37 Ben Kohler gentoo-dev 2013-06-14 15:07:54 UTC
Almost 3 years and 15 kernel releases later, is this still a problem, or can this bug be closed?
Comment 38 Drake Wyrm 2013-07-26 18:09:34 UTC
I have experienced this bug, or one very much like it, with every stable kernel version since sys-kernel/gentoo-sources-3.6.11-r1. This disturbs me, as my kernel version is no longer in the portage tree. I am running an amd64 system, Xorg version 1.13, Nvidia video with nouveau driver, and my window manager is Xfce4.

On every stable kernel I have tried after 3.6.11-r1, I can boot successfully. X and Xfce load successfully. I can start at least one graphical application (Xterm, Firefox, whatever: it does not seem to matter which). Usually, upon attempting to open a second application, the system freezes. The mouse does not move, the display does not update, the system no longer responds to pings or attempts to connect via ssh, and anything that I left running in a terminal ceases to run.
Comment 39 Drake Wyrm 2013-07-26 18:39:32 UTC
Created attachment 354262 [details]
dmesg output on crash

I noted the open bracket on a line by itself at the end. The kernel must have died while carving it.
Comment 40 Ben Kohler gentoo-dev 2013-07-26 19:01:27 UTC
3.7 brought some major nouveau rewrites, quite a few bugs popped up after then.  Bug #472200 is one example.  I don't think that's really related to this bug, other than that they are both kernel-DRM related crashes.
Comment 41 DEMAINE Benoît-Pierre, aka DoubleHP 2013-07-26 19:31:54 UTC
Yeah. It's the new free software policy since a few years. Stop bother fixing existing code when you can flood users and admins with new versions with new bugged features. Just trying to distract us from the root issue: they are adding new features instead of fixing known issues.

Just wait to have enough new vers to let apk go outdated.

That's why I stopped updating my Gentoo in 2010.

That's why I stopped reporting bugs.

I was tired of spending my life on updating and having to WA new bugs. Now, I stick to what I have, and use it as is.

I am now interessed in having 10000 new features per month, if 90% of them will bug for me.
Comment 42 Ben Kohler gentoo-dev 2013-07-26 19:39:55 UTC
If you have information relating to the original bug, please comment here so someone can re-open it.  If you have a new bug, please open a new report.  For anything else, bugs.gentoo.org is not the right venue.