Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 267080 - x11-base/xorg-server-1.6.5-r1 crashes without error message, and makes the kernel freese without any warning.
Summary: x11-base/xorg-server-1.6.5-r1 crashes without error message, and makes the ke...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo X packagers
URL: https://bugs.freedesktop.org/show_bug...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-22 11:06 UTC by DEMAINE Benoît-Pierre, aka DoubleHP
Modified: 2010-01-19 00:57 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
/tmp/emerge--info (emerge--info,13.17 KB, text/plain)
2009-04-22 11:06 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details
/etc/X11/xorg.conf (xorg.conf,29.50 KB, text/plain)
2009-04-22 11:06 UTC, DEMAINE Benoît-Pierre, aka DoubleHP
Details

Note You need to log in before you can comment on or make changes to this bug.
Description DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-22 11:06:16 UTC
updated to x11-base/xorg-server-1.5.3-r5
did all recommended steps:
- check I have =x11-libs/libpciaccess-0.10.5
- qlist -I -C x11-drivers/ | xargs emerge -v1

and a few more things

result: when i type "startx", I have a few lines printed in the console, then console switch to black screen (usually, because X switching from tty1 to tty7, 7 is empty), then, computer is frozen.

After reboot, no recent log file have been created. Last log file is from last X 1.3 .

Workaround: fill /etc/portage/package.mask with
>=x11-base/xorg-server-1.4
=x11-libs/libpciaccess-0.10.5
=x11-drivers/xf86-input-evdev-2.1.3
=x11-drivers/xf86-video-mga-1.4.9
=x11-drivers/xf86-input-keyboard-1.3.2
=x11-libs/libXrender-0.9.4
=x11-proto/renderproto-0.9.3

and re-update system (emerge -aNuv world)
and pray (and remerge x11-drivers/*, of course).

This bugs reminds me http://bugs.gentoo.org/show_bug.cgi?id=194515 except ... now, it's way worse: no log file at all.

I wonder how 1.5 could come out stable when critical bugs are still open against 1.4 .

Once again, I have to ask for immediate and total masking of X-1.5 (at least for stable; unstable people assume their choice) .

Providing xorg.conf is useless; i dont even know if X reads it.
Comment 1 DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-22 11:06:37 UTC
Created attachment 189144 [details]
/tmp/emerge--info
Comment 2 DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-22 11:06:54 UTC
Created attachment 189145 [details]
/etc/X11/xorg.conf
Comment 3 Rafał Mużyło 2009-04-22 15:15:43 UTC
Probably invalid.
kernel panic is just your assumption.
standard set of questions:
- is xf86-input-evdev emerged ?
- is hal running ?
- did you reemerge drivers after xorg-server update ?
Comment 4 DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-22 17:53:21 UTC
yes evdev is emerged as a dep required by the evdev word declared in INPUT_DEVICE (did you read emerge --info attached ?)

yes hal was running, as dep of the system due to hal USE flag (did you read atached emerge --info ?)

yes drivers' been remerged as said in #0 (did you read initial report ?) as asked in enotices.

When the box stops responding pings just when starting X, and, ping dont come back after 25mn ... if kernel did not paniced, I wonder how you call this situation (no HDD activity, there was free RAM, CPU was cold ... ) froze >12 times per day for 3 days ... tried 5 different xorg.conf ...

If you prefer, we can change the title to: "xorg-server-1.5.3-r5 freeze system and dont create log file". Would not make difference for me.
Comment 5 Rafał Mużyło 2009-04-22 18:19:31 UTC
I missed that line.
Have you tried starting from scratch,
that is 'Xorg -configure' ?
Comment 6 DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-22 18:52:57 UTC
No, because 3d without X on my main station is already too much: got work to do.

I thought about it just before downgrading, then i thought: if the problem was really the conf itself, if restarting the conf could help, it would have impact only at the point X uses the conf to take decisions. And this happens after HX detection. So, at the point of the process where it could start make a decision, the log file should have been created since long time, and already filled with nearly 100 lines. => not worth spend time => downgraded.

I will script upgrade and downgrade if you have serious tests to ask me to do. But just -configure will not be a suffisant reason. Maybe later when you have more tests to suggest. But as log file is not created, I just think it means that ext driver is killed at once (unless X dont even try creating the log).

The only usefull thing to do, is to use a serial console, and start X from there: X prompts a dowen of lines before switching console; logging serial line would be the *only* way to get logs. But will take time to set up.
Comment 7 Rafał Mużyło 2009-04-22 20:10:28 UTC
'Xorg -configure' is simply an attempt to see,
if the problem lies somewhere in your xorg.conf.
Comment 8 Lars Wendler (Polynomial-C) gentoo-dev 2009-04-23 21:21:00 UTC
You wrote your computer would be frozen after invoking startx. Did you verify this by trying to log into the computer via ssh from another machine?
Comment 9 DEMAINE Benoît-Pierre, aka DoubleHP 2009-04-23 21:31:04 UTC
No, not from SSH; but, I said that the second just after typing startx, the box stops answering ping. I never saw a machine that stops answering ping, and would keep sshd alive ...

or in cases that go very far out from the scope of this bug :) (attacks against BSD or Solaris where you saturate SWAP remotly, severe memory saturation, load >20) ...

I startx in "normal" conditions of use of my computer :) so, if no more ping, I just assume that ssh would also stop working. And once, I waited >25mn before reseting (yes, some X bugs unfreeze after 8mn, i know that can happen).
Comment 10 Lars Wendler (Polynomial-C) gentoo-dev 2009-04-25 14:10:11 UTC
(In reply to comment #9)
> No, not from SSH; but, I said that the second just after typing startx, the box
> stops answering ping. I never saw a machine that stops answering ping, and
> would keep sshd alive ...

Sorry, I missed that completely.
Comment 11 David Abbott gentoo-dev 2009-04-27 14:20:07 UTC
I have had X lock up on me before after I startx and if I ssh from another box before I startx I was able to kill X from there and it worked. I also did not have any logs produced, but it did save a hard restart. It may have something to do with libpciaccess, the mga driver, Xinerama and the new xorg-xserver. This may help;
http://caramboo.com/2009/04/09/36-hours-of-pain/
Comment 12 Rémi Cardona gentoo-dev 2009-05-06 11:50:16 UTC
(In reply to comment #0)
> I wonder how 1.5 could come out stable when critical bugs are still open
> against 1.4 .
> 
> Once again, I have to ask for immediate and total masking of X-1.5 (at least
> for stable; unstable people assume their choice) .

That's not going to happen. 1.5 works _much_ better for a vast _majority_ of users. I'm very sorry to see your setup being this broken and I wish things would work better.

If you're still interested in _helping_ getting this fixed, I strongly suggest you get in touch with upstream X developers (the fd.o bug report in the URL field is a good start).

Upstream needs testers for lesser used hardware such as Matrox cards. If you need any help testing patches, come talk to us on #gentoo-desktop on FreeNode, we can try to help you out.

Please understand that we just cannot make everyone happy but that all of us are happy to help whenever we can.

Thanks
Comment 13 Rémi Cardona gentoo-dev 2009-05-06 12:52:40 UTC
*** Bug 265100 has been marked as a duplicate of this bug. ***
Comment 14 DEMAINE Benoît-Pierre, aka DoubleHP 2009-05-06 22:10:08 UTC
I made a test with only one head in the conf; but, if X did not read the conf, or activated all heads dispite my conf ... 

How is this bug MGA specific ? why did you change the description ?

I am not asking anythink that difficult; I am not asking for advanced stuff to work wonderfully with magick and stars. I want a conf that works fine with 1.3 ... *not make 1.5 just freese at start without any message*.

This bug should in fact block bug 210710 (after renaming it).

Do you just assume this happens to me because I previously had problems with 1.4 and MGA driver ? this bug is a compleetely new problem; assuming it is MGA related is just ... "probably wrong".

But, you make me think of a new test: remove MGA cards, and leave only the NV one.
Comment 15 DEMAINE Benoît-Pierre, aka DoubleHP 2009-05-06 22:13:52 UTC
Any way, in my conf, I sometimes ask X to use the VGA or VESA driver for some MGA card. The problem would then become:
- non MGA driver problem
- does X really care about my conf file, or, does it just ignore it ?

I dont have time to perform full tests before 2 weeks. But, I now have a long list of tests to do, and, will do them when I can (aka: take time to rcompile X 1.5 in close future).
Comment 16 Rémi Cardona gentoo-dev 2009-05-06 22:56:47 UTC
There are a few things you need to keep in mind :
 - back when 1.3 was hot and new, having a monster xorg.conf with dozens of options was mandatory if you wanted any non-trivial configuration (ie, anything with more than 1 screen/gfx card/mouse/keyboard). This is no longer the case, things have changed. Today, keeping all those options can do more harm than not having anything.
 - multi-card X is known to be broken in some cases. It's a shame. Thing is, almost no-one has 2 gfx cards in their boxes anymore. 1 card/chip is able to driver 2 or more DVI/VGA outputs + a TV output. The reality is that the multi-card support in X has some serious bit-rot.
 - the "mga" driver is barely maintained. Again, the reality is that Matrox cards are now a rarity.
 - the "nv" driver is almost just as bad as the "mga" driver. I would say it's even more crap because it's 100% obfuscated thanks to nVidia. No one changes the code except Aaron Plattner who happens to work for nVidia. The "vesa" driver is probably in a much better state.

So in conclusion, what *you* happen to think is not much is in fact a *lot* of work and cleanups that no-one has done for *years*.

I'm not blaming you here, you're angry because of all the trouble. But please, let's try to move forward from here to actually fix bugs one step at a time. Here's a small list of things I suggest you try :
 - try removing your xorg.conf (rename it to xorg.conf.old). X now can actually run without any conf file and for simple setups it works just fine. It's a good first step to see if Xorg is able to detect your hardware and make it work.
 - try removing your nvidia card. One driver with multiple cards is hard enough to support as it is, 2 drivers is almost guaranteed to fail.
 - if you have something that works more or less, try adding new "Screens" one by one, with as little options as possible. "Less is More (tm)."

Now, on the plus side, I talked to upstream folks who recently touched the mga driver and it seems that they might be able to spend some time fixing the driver. They seemed bothered enough. This may take a few days/weeks(/months?) but there's a chance you could test new code as they write it. Please add yourself as a CC on the linked bug.

Thanks
Comment 17 DEMAINE Benoît-Pierre, aka DoubleHP 2009-05-07 06:41:33 UTC
Stupid simple question: how can X know which screen is upper or lower or right or left without conf file ? (yes I could write an auto probe script using webcam, but, no this is not a viable answer). You say to do it for a try, and, I will try it. But I hardly see how it could work: X does not create log file at all.

In short: things used to work fine; after update, they just don't. This is what we (all here) technically call a "regression bug".

The problem is not about screens (neither X conf file section, nor physical monitors): it crashes when I declare only one screen in the Serverlayout; and anyway, as there is no log file, I don't have any guaranty X even tries to read the conf.

What *I* think is that, either the program makes bad syscalls, or that syscalls make cards crash. But, again, it used to work for over 2 years.

MGA cards are still sold over 200€. It's a shame Windows make them work better than Linux (I hardly get 2 screens working out of a G45x4).

My problem has nothing to do with linked bug: I don't have log file at all.

Bug 265100 is *NOT* dup of this.

this bug should be renamed to: machine freezes, and X does not create log file (not even empty file after reboot). (I am using ext3, this is an important point: file creation and file size in FS headers are sync operations; only data transfer to plate are async: absent file means: application did not try yet to create it; this would be different on XFS or Reisor).

This is an aggravation of bug 194515 .
Comment 18 Rémi Cardona gentoo-dev 2009-05-07 08:30:35 UTC
(In reply to comment #17)
> Stupid simple question: how can X know which screen is upper or lower or right
> or left without conf file ?

the xrandr utility can do this, but that's a different topic.

> (yes I could write an auto probe script using
> webcam, but, no this is not a viable answer). You say to do it for a try, and,
> I will try it. But I hardly see how it could work: X does not create log file
> at all.

I understood that perfectly.

> In short: things used to work fine; after update, they just don't. This is what
> we (all here) technically call a "regression bug".

And I know this as well.

> The problem is not about screens (neither X conf file section, nor physical
> monitors): it crashes when I declare only one screen in the Serverlayout; and
> anyway, as there is no log file, I don't have any guaranty X even tries to read
> the conf.
> 
> What *I* think is that, either the program makes bad syscalls, or that syscalls
> make cards crash. But, again, it used to work for over 2 years.

Alright since you insist, I will get into details : up until very recently, Xorg was not a "standard" application. It had among others :
 - a threading system and a scheduler
 - its own dynamic library format and loader
 - its own PCI bus driver

Xorg was not an application, it was an operating system. Ever since Xorg forked from XFree, all those bits have been replaced one by one with what the kernel provides.

Between 1.3 and 1.5, it's the PCI handling that got ripped out and, unfortunately, the code that initializes secondary PCI cards didn't get as much testing and has now bit-rotted.

So yes, it *used* to work, but *now* it doesn't. You know why, stop reminding me every other sentence that is used to work.

> MGA cards are still sold over 200€. It's a shame Windows make them work
> better than Linux (I hardly get 2 screens working out of a G45x4).

Let's stop comparing with windows, this has _nothing_ to do with the bug at hand.

> My problem has nothing to do with linked bug: I don't have log file at all.
> 
> Bug 265100 is *NOT* dup of this.
> 
> this bug should be renamed to: machine freezes, and X does not create log file
> (not even empty file after reboot). (I am using ext3, this is an important
> point: file creation and file size in FS headers are sync operations; only data
> transfer to plate are async: absent file means: application did not try yet to
> create it; this would be different on XFS or Reisor).
> 
> This is an aggravation of bug 194515 .

Then you just won a free trip to FreeDesktop's bugzilla to file a new bug. Please add "remi@gentoo.org" on the bug so I can track it and backport whatever patches upstream cooks up.

Thanks
Comment 19 Gilles Dartiguelongue gentoo-dev 2009-05-07 12:05:10 UTC
(In reply to comment #17)
> Stupid simple question: how can X know which screen is upper or lower or right
> or left without conf file ? 

it can't, but that's not the point of the test we ask you to do.

> In short: things used to work fine; after update, they just don't. This is what
> we (all here) technically call a "regression bug".

sure, but bugs happening for corner case use or hardware can't always be fixed in time and sometimes forcing the way forward has greater benefit than keeping things as is. xorg-server-1.5 is a *giant* step forward for X wrt to 1.3 or 1.4 release and nobody wants to go back. Not us gentoo devs, nor other distros most of which have already moved away from even 1.4, nor upstream. They won't even take 1.3 bugs anymore anyway.

> The problem is not about screens (neither X conf file section, nor physical
> monitors): it crashes when I declare only one screen in the Serverlayout; and
> anyway, as there is no log file, I don't have any guaranty X even tries to read
> the conf.

I see you have problems copying the output of the X server when it starts because it changes tty, did you try starting it from ssh, it's easier to setup than serial console.

Also, if X crashes really early in the start process, you might get lucky by stracing it and/or executing instructions step by step with a debugger. Then even if it freezes you'll have the list of the last calls involved and that will help us go forward (probably not us directly but upstream devs at this point).

> What *I* think is that, either the program makes bad syscalls, or that syscalls
> make cards crash. But, again, it used to work for over 2 years.

like said earlier, this is not relevant, previous code is deprecated it might still work but as soon as you'll get a bug with it, no-one will want to hear from you. X clients are moving to new X technologies as they go and there will be a point in the (not so far) future where you will just not be able to keep xorg-server-1.3.

> MGA cards are still sold over 200€. It's a shame Windows make them work
> better than Linux (I hardly get 2 screens working out of a G45x4).

if matrox was hiring devs to work on linux support, maybe that would work better, but this is not a topic for bugzilla, thanks for keeping this out of here.

finally, for any kind of debugging to be useful, you will need to set up your system as in [1] and you will probably want to create binpkgs to ease the pain of going back and forth between X.

[1] http://www.gentoo.org/proj/en/qa/backtraces.xml
Comment 20 Rémi Cardona gentoo-dev 2009-08-17 21:09:24 UTC
Please get back to us with a bug report url like I advised in comment #18.

Thanks
Comment 21 DEMAINE Benoît-Pierre, aka DoubleHP 2010-01-15 23:07:42 UTC
So, as I complained several times, the problem is *NOT* Matrox. I originally reported this against Xorg, and, i still complain about xorg:

Brand new hardware, new video cards, 3 Radeon dual, brand new gentoo:
uranus ~ # X -configure

X.Org X Server 1.6.5
Release Date: 2009-10-11
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.31-xen-r10-Gentoo-uranus-1-08 x86_64
Current Operating System: Linux uranus 2.6.31-xen-r10-Gentoo-uranus-1-09 #2 SMP Fri Jan 15 20:48:54 CET 2010 x86_64
Build Date: 15 January 2010  09:38:29PM

        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Jan 15 23:38:11 2010
List of video drivers:
        radeon
        ati
        radeonhd
        vboxvideo
        vmware
        apm
        v4l
        fbdev
        vesa
(++) Using config file: "/root/xorg.conf.new"

(i got this through SSH)


And same result:
- compleet freese
- X crash
- black local console (for the screen that is plugged on the machine)
- no error message in the console where I started X (through SSH)
- no log was created

New element to add:
- no conf is created (dispite the mesage !!! )

X alone works fine. I got the freese only with "X -configure" for now. (rporting step by step, otherwise, i get confused)

But, there is something new I can add:
/var/log/Xorg.0 have been renamed to /var/log/Xorg.0.log . So, old log is updated. But, the new log file is NOT created.

And the promised config /root/xorg.conf does not exists:

uranus log # ls -l /root/xorg.conf*
ls: cannot access /root/xorg.conf*: No such file or directory
uranus log #

In short: branc new everything (hard and soft), different manifacturer, differnt CPU, differnt MB, same bug.

YES, i am angry against the guy who said that the problem is due to Matrox; i always denied this point. The problem has never been Matrox board, or driver. The problem is Xorg.

x11-base/xorg-server-1.6.5-r1

So, after nearly one year, I put back the old bug name. And I apologise to the guy who have seen his bug marked as bup of this one, when this should not hgave been done. Sorry Ronny. But, as my bug has nothing to do with any kind of Matrox issue, your bug is *NOT* a dup is this one.

My new box don't have any bit of MAtrox hardware, and all mga flags are now turned off. So, NO THIS IS NOT A MATROX ISSUE. Matrox is a nice manifacturer, and their product are fine.

I will get you an xorg url if you want. But i am angry about your "short conclusion" against Matrox. But, maybe wou will tell me that ATI is just not better than Matrox ?

Note: the hardware works with under Debian Lenny, all 6 outputs, using the ATI dirver (called fglrx_drv.so ).
Comment 22 DEMAINE Benoît-Pierre, aka DoubleHP 2010-01-16 00:40:35 UTC
Note: in this bug, i do not complain about the fact X freese, kernel panic, but about the fact X makes kernel panic, and/or X crashes without error message in console (with or without KP), and log file is not created.

In short, about the fact when X crashes, there is no way to debug it.
Comment 23 DEMAINE Benoît-Pierre, aka DoubleHP 2010-01-16 13:35:06 UTC
According to http://wiki.debian.org/XStrikeForce/HowToRandR12 , I think the problem may be in the Randr layer. A solution could be to:
- as I ask since over one year, downgrade to X 1.4 (thus, re-introduce it in the tree, or even 1.3) (would work, I did it already)
- disable Randr in X 1.6
- upgrade to Randr 1.3 and X 1.7

Not sure yet.

Any one ever could get a combo (or more) of video cards with X 1.5 or 1.6 ? how ?
Comment 24 DEMAINE Benoît-Pierre, aka DoubleHP 2010-01-16 23:58:40 UTC
I have migrated to X 1.7, reinstalled Mesa 1.3. The single fact I have two cards is a problem:

X.log:
(WW) VGA arbiter: cannot open kernel arbiter, no multi-card support

From IRC: http://airlied.livejournal.com/67628.html
> we are hoping to upstream the kernel code for 2.6.32 and push the
> libpciaccess and X.org server patches to their master repos

So, the failure in software is real, really not due to Matrox, and multiple in several software. There have been feature breakage, and regression bugs everywhere:
- X
- Mesa
- Randr
- Xinerama
- Linux kernel

So, it's now over one year "having two video card" is not possible any more (i reported the bug in early 2009, but I gave X 1.5 or 1.4 a try un december 2008, and ended up in failure). My desktop is thus, still using 1.3 !!! the latest working one with two cards.

So, no need to report or complain upstream. In fact, they are aware of the problem, since very long time.

The solution is: wait a bit more. Upgrade to
- mesa 1.3
- X 1.7
- linux 2.6.32 (hopefully, that will include at least 3 patches i needed, amongst which i can quote "VGA arbiter: cannot open kernel arbiter, no multi-card support" that is likely to be called VGA_ARB)
Comment 25 Rémi Cardona gentoo-dev 2010-01-19 00:41:55 UTC
Like I've been telling you for over 6 months, there's nothing we can fix here.

None of us have mga cards, and very few of us have multi-gpu setups. The only ones that are known to work (more or less) are pure-nvidia setups.

Like I've told you again and again, please get in touch with _upstream_ to get your issues looked at and fixed. The fact that Xorg doesn't save a log file _is_ a graphics hardware issue (gpu goes down and takes the whole system with it).

As for us bringing back old versions, I'm sorry, but that's just not going to happen. Debian can do whatever they want, but since we're a rolling distro, we've long decided to stop supporting those versions. Nothing is preventing you from bringing them back in a local overlay, portage is designed for this. We barely have enough man power to keep 2 versions of Xorg in portage...

So please, file a bug upstream, paste the url here so we can track the issue and backport patches if needs be, but stop complaining here, we _cannot_ help you more than what've done so far.

Thanks
Comment 26 DEMAINE Benoît-Pierre, aka DoubleHP 2010-01-19 00:57:24 UTC
This is not a GPU issue. X just don't know anymore how to handle all this; the X stack (as you said) have been rewritten from scratch, and all old working code has been "deleted". As i said, the problem is already known upstream; the problem will be solved when all this is in stable Gentoo:
- X 1.7
- Mesa 1.4
- a kernel including ARB and KMS; hopefully 2.6.33.

There is not just one URL to guve you, not just one tracker to report. The (full) multi-card support requires "several" dependencies.

In short, we could make this bug depend on stable-req of the above listed applications and ebuilds. But, for example, 2.6.33 is not yet in ~ ... and we are not even 100% certain yet it will include ARB.

But i need to log in here, so that, if any other user also have the issue, he can at least learn the strict minimum required to fix it. Somepeople report multi-card systems to work, but they don't know how it works; and can not explain why it work (for them) without ARB.

And for the last time: the problem is not in the GPU, or in any close source friver, or in any card from any manifacturer you dislike. The problem is in X and the kernel.

In short, I easily get a crash, or a KP as soon as I pass in xorg.conf, in a CARD section, a BusID line with an address different from the address of the first(master) VGA card. I can get to work all (2 or more if any) monitors on the first VGA controller, but, there is a bug in X that renders unusable (at the moment) any GPU that is not the first one (sse the one that got a star in X.logs). I can not even ask X to use "only the second one".

My problem have been verified and confirmed by several people on IRC and forums. We are all waiting for ... what I listed higher. Saying that, it's right ... we can not help or do anything but wait.