Using nv, vesa, or nvidia drivers /usr/bin/X crashes with: Backtrace: 0: X(xf86SigHandler+0x84) [0x80c4294] 1: [0xb7f3d420] Fatal server error: Caught signal 4. Server aborting Aborted I have tried xorg-server 1.1.1-r1 , 1.1.1-r4 ,and 1.2.0-r1 Reproducible: Always Steps to Reproduce: 1.Install gentoo 2006.1 via handbook 2.Reboot and emerge gnome 3.emerge proper Nvidia drivers (although vesa and nv do fail). 4./usr/bin/X Actual Results: X failed to launch. When using the binary NVidia drivers I did see the Nvidia splash. Expected Results: X should work.
Not baselayout.
Don't waste people's time by referring to forums.g.o. for bug descriptions. Reopen w/ nvidia-drivers version, Xorg.0.log, xorg.conf and emerge --info output.
Created attachment 111761 [details] output of cat on requested files...
Requested info attached. Not to be a prick but if you had WASTED 2 seconds looking at my FORUM POST..... you would see that this stuff is already posted there! (I figured it wouold be more readable on the forum) And keep in mind it fails the same way with nv, vesa, and nvidia.
(In reply to comment #4) > Not to be a prick but if you had WASTED 2 seconds looking at my FORUM POST..... > you would see that this stuff is already posted there! (I figured it wouold be > more readable on the forum) Yeah, and is completely useless there when someone's searching bugzilla.
Good point(In reply to comment #5) > (In reply to comment #4) > > Not to be a prick but if you had WASTED 2 seconds looking at my FORUM POST..... > > you would see that this stuff is already posted there! (I figured it wouold be > > more readable on the forum) > > Yeah, and is completely useless there when someone's searching bugzilla. > Good point..... Well any ideas on what is up here?
how come you have a pentium processor, but the 3dnow USE flag is set? This is an AMD-specific optimization.
(In reply to comment #7) > how come you have a pentium processor, but the 3dnow USE flag is set? This is > an AMD-specific optimization. > Good eye! This must have slipped in the make.conf I copied over from another box.... Hmmm I'd bet my other P4 box has this also... Whoa sse2 doesn't belong here either. Well I'm rebuilding the world file as I type this... Thanks. Could this be the cause?? Does xorg-server use these flags? Either way I'm rebuilding.... I'll see in a day or two I guess. Just when I though thought I had a grip on things.... The noob monster strikes!
(In reply to comment #8) > Could this be the cause?? Does xorg-server use these flags? Either way I'm > rebuilding.... I'll see in a day or two I guess. It could be the cause, and xorg-server doesn't have to use these flags to be affected by them.
> > It could be the cause, and xorg-server doesn't have to use these flags to be > affected by them. > I figured that might be the case. Unfortunately I recompiled the system and world file and still get the same error.
Signal 4 is SIGILL, illegal instruction. This is seen only in two cases: (1) A wild jump caused the process to try to execute content that is not actually code. (2) The process is using a function built for a newer processor. For instance, the "conditional move" instruction introduced in the i686 family will cause this type of crash when executed on an i586 or below. Based on the observation in comment #7, I am leaning toward the latter. Please attach the contents of "cat /proc/cpuinfo". I note from your emerge --info that you are using a Celeron processor, but there is insufficient information to tell which Celeron. The notes on http://gentoo-wiki.com/Safe_Cflags indicate that some Celerons need a -march of pentium2, whereas others can take higher values. You are using pentium3. Also, if you have sys-devel/gdb installed, please use it to disassemble the faulting instruction. Invoke gdb as: "gdb /usr/bin/X <path-to-core-file>". When gdb prompts, enter "disassemble 0x80c4294 0x80c42b4". Then post the resulting output.
kidsbox13 ~ # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Celeron(TM) CPU 1300MHz stepping : 1 cpu MHz : 1292.674 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 2586.07 I have always assumed that it was coppermine (p3) based... I think the fastest pre-coppermine celerons were in the 500mhz range. (though I would be happy if I was wrong!) I'm headed out to eat but as soon as I get back I'll try and run gdb as described. "gdb /usr/bin/X <path-to-core-file>". I am assuming that <path-to-core-file> is the path to a file that dumps the debug info(??). Please elaborate on that.
Created attachment 113034 [details] Disassembler dump
Ok I'm not sure about the <path-to-core-file> part (excuse my ignorance).. but I am seeing some new output when using gdb. First off I am getting: Program received signal SIGILL, Illegal instruction. [Switching to Thread -1211009360 (LWP 21743)] 0xb7f32160 in FontParseXLFDName () from /usr/lib/libXfont.so.1 From just running X under gdb and have attached a disassembler dump (of the address reported above (well not the same one... but the a dump form the same error).
Your disassembler dump does not match up with any of the addresses you have posted as faulting. Are you sure you disassembled the right area? When a program crashes on Unix, the kernel may (depending on various settings) save a memory dump of the program as it was at the time of the crash. This is referred to as "dumping core" and the resulting file is usually called a "core file." The X server may not be dumping core, depending on how it dies and whether its signal handlers are executing correctly. For this type of failure, it is possible that the server is taking a secondary fault in its signal handler, which would most likely kill it completely. Running gdb on a live X server is just as good for our purposes, though it can be dangerous if the debugger breaks in while your console is not in a usable state. Your particular crash seems to be at a safe time. You could reduce the risk by connecting to the system over ssh and running gdb in the resulting session. So far, I have not found any definitive information on what Pentium family your Celeron belongs in. The closest hit is on the Talk page of the Gentoo Safe_Cflags document referenced in comment #11. Under the banner <http://gentoo-wiki.com/Talk:Safe_Cflags#Suggested_CFLAGS_not_safe_for_my_Intel.28R.29_Celeron.28R.29_processor>, someone writes that he had to back down to -mcpu=i686 to get a working build. This is probably an overly cautious setting, since -mcpu affects tuning, but does not grant gcc the liberty to use newer instructions. However, lacking any clear answer as to what instructions your Celeron really handles, I would suggest to change the -march to something very safe, like -march=pentium. Then rebuild the package which provides the library which gdb identifies as failing. If that works, you should probably re-emerge everything else to clean out any other uses of unsafe instructions.
(In reply to comment #15) > Your disassembler dump does not match up with any of the addresses you have > posted as faulting. Are you sure you disassembled the right area? The dump I posted came from the line that reads: 0xb7f32160 in FontParseXLFDName () from /usr/lib/libXfont.so.1 But the address changes each time it runs... (allocating a new spot in mem, right?) Also running /usr/bin/X gives the same output but two different memory addresses (but they stay the same with consecutive runs): Could not init font path element /usr/share/fonts/misc/, removing from list! Backtrace: 0: X(xf86SigHandler+0x84) [0x80c4264] 1: [0xb7fde420] Fatal server error: Caught signal 4. Server aborting I guess you are looking for disassemble dumps from these two? I'll post them. > > When a program crashes on Unix, the kernel may (depending on various settings) > save a memory dump of the program as it was at the time of the crash. This is > referred to as "dumping core" and the resulting file is usually called a "core > file." The X server may not be dumping core, depending on how it dies and > whether its signal handlers are executing correctly. For this type of failure, > it is possible that the server is taking a secondary fault in its signal > handler, which would most likely kill it completely. Great stuff... Thanks! I'm at a loss as to where the core is being dumped (if it is) is it dumping to a plain text log file? Sorry again for the noob questions > > Running gdb on a live X server is just as good for our purposes, though it can > be dangerous if the debugger breaks in while your console is not in a usable > state. Your particular crash seems to be at a safe time. You could reduce the > risk by connecting to the system over ssh and running gdb in the resulting > session. > That works for me... The box is in Louisiana and I'm in North Carolina.(its for some kids to play on so it does need X) SSH is the only way I have access to it! > So far, I have not found any definitive information on what Pentium family your > Celeron belongs in. The closest hit is on the Talk page of the Gentoo > Safe_Cflags document referenced in comment #11. Under the banner > <http://gentoo-wiki.com/Talk:Safe_Cflags#Suggested_CFLAGS_not_safe_for_my_Intel.28R.29_Celeron.28R.29_processor>, > someone writes that he had to back down to -mcpu=i686 to get a working build. > This is probably an overly cautious setting, since -mcpu affects tuning, but > does not grant gcc the liberty to use newer instructions. However, lacking any > clear answer as to what instructions your Celeron really handles, I would > suggest to change the -march to something very safe, like -march=pentium. Then > rebuild the package which provides the library which gdb identifies as failing. > If that works, you should probably re-emerge everything else to clean out any > other uses of unsafe instructions. > With that I'm gonna recompile with march=i686 and be done with that mess (I'm fairly certain its a p3 based celeron though, either way i686 is good for now.) Let me know if I'm misinterpreting anything you say... Or if there is anything I can post that would help you help me. Thanks, -Ian
Created attachment 113113 [details] Disassemble on 0x80c4264 I have done a: disassemble 0x80c4264 0xb7fde420 also but it generated a 16Mb file... I think I'm getting the syntax wrong on that one. Am I really supposed to be getting a 0x80c4264 to 0xb7fde420? That is what is being dumped. Thanks Again, -Ian
In order of posting: The address may change each time the kernel maps it into a new process. To reliably locate the same region, you will need to refer to it symbolically, which will have gdb find where the function is this time around. The output from /usr/bin/X is probably consistent because /usr/bin/X is not a position-independent executable, so it is mapped at the same address with every run. The security people frown on non-PIEs, but that does not affect your current problem. Core dumping depends on the X server configuration. I don't recall if the default is to dump core or not. For programs which dump core, it is traditionally named "core" and placed in the current working directory. It is not a text file, but rather a binary file representing the program's CPU and memory state. You need a dedicated tool, such as gdb, to analyze it. You're following along quite well. The only information you haven't provided is the disassembly of the faulting instruction, but that's not that important for dealing with this. When you run disassemble with two arguments, it treats the arguments as a [start,stop] pair and disassembles all memory between them. I specified the two operand form to get around gdb trying to disassemble the entire containing function. I intended that the second operand be ~32 bytes higher than the first, rather than covering a huge range of the address space. I requested a disassembly primarily to try to identify which instruction was causing the SIGILL, in hopes that would hint which gcc setting had caused the unusable binary. If the problem goes away after you rebuild with less aggressive settings, then don't worry about getting the disassembly. Also, please resolve this bug as invalid if the problem goes away with the -march changes.
Please instruct for further debugging. I've rebuilt the system x2 and worldfile x2! X is still a no go. kidsbox13 rebuildlogs # emerge --info Portage 2.1.2.2 (default-linux/x86/2006.1, gcc-4.1.1, glibc-2.5-r0, 2.6.18-gentoo-r6 i686) ================================================================= System uname: 2.6.18-gentoo-r6 i686 Intel(R) Celeron(TM) CPU 1300MHz Gentoo Base System release 1.12.9 Timestamp of tree: Fri, 09 Mar 2007 11:50:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] dev-java/java-config: 1.3.7, 2.0.31 dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.14 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-march=i686 -O2 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-march=i686 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks metadata-transfer nostrip sandbox sfperms strict" GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/" LINGUAS="en en_GB" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://joshlap/gentoo-portage" USE="X aac aalib alsa asf asm berkdb bitmap-fonts cairo cdr cli cracklib crypt cups dedicated dri esd ffmpeg firefox flac fortran gdbm gnome gpm gtk gtk2 hal howl iconv ipv6 isdnlog java jpeg libg++ lm_sensors mad midi mmx mpeg ncurses nfs nls no-nptl nptl nptlonly nsplugin nvidia ogg opelgl pam pcre pdf perl png ppds pppd python readline reflection rouge samba sdl session spl sse ssl tcpd teamarena theora tiff truetype-fonts type1-fonts unicode usb v4l vorbis win32codecs x86 xatrix xine xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en en_GB" USERLAND="GNU" VIDEO_CARDS="vesa nv nvidia" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
To make X dump, add this (pasted from xorg.conf man page) to a ServerFlags section of xorg.conf: Option "NoTrapSignals" "boolean" This prevents the Xorg server from trapping a range of unex- pected fatal signals and exiting cleanly. Instead, the Xorg server will die and drop core where the fault occurred. The default behaviour is for the Xorg server to exit cleanly, but still drop a core file. In general you never want to use this option unless you are debugging an Xorg server problem and know how to deal with the consequences.
(In reply to comment #20) > To make X dump, add this (pasted from xorg.conf man page) to a ServerFlags > section of xorg.conf: > > Option "NoTrapSignals" "boolean" > This prevents the Xorg server from trapping a range of unex- > pected fatal signals and exiting cleanly. Instead, the Xorg > server will die and drop core where the fault occurred. The > default behaviour is for the Xorg server to exit cleanly, but > still drop a core file. In general you never want to use this > option unless you are debugging an Xorg server problem and know > how to deal with the consequences. > Oh no.... apparently. The box is now gone! It was for my finances little sister, and her mother (in the middle of moving) might have gotten rid of it. If it is not gone I would love to continue finding a solution to this bug, but I'll have to ask... It is not responding to my ssh attempts. I guess I'll leave it marked as NEW, but if anyone feels the need to change the status to more appropriately reflect the situation please do so. I'll call there sometime and see if they still have it; Chances are slim though.
Oh and thanks for all of the help! You guys/gals are great. This wonderful distro would be nothing with out you all. Thanks for your time, -Ian
(In reply to comment #22) > Oh and thanks for all of the help! You guys/gals are great. This wonderful > distro would be nothing with out you all. Thanks :) We'll close the bug as NEEDINFO for now.