Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 166860 - System freeze OR various apps segfault after a (short) while.
Summary: System freeze OR various apps segfault after a (short) while.
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-14 16:54 UTC by Charles de Noyelle
Modified: 2007-03-08 00:27 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Charles de Noyelle 2007-02-14 16:54:14 UTC
My system crashs several times a day, doing "nothing". I can not reproduce the EXACT path to crash but :
Without lauching X :
# updatdb 
(50 seconds)
updatedb : Segmentation fault

When X is on, it is faster : sometimes just launching Firefox 2.0 will have it down. When X is on, the system freeze. Sometimes, the "caps lock light" on keyboard blinks ; sometimes, not. When X is down, 

(from another computer) :
other $ ssh amd64
Password:
amd64 # updatedb
amd64 # emerge (...)
Segfault
other $ ssh amd64
Password: 
[[[[ Nothing appears : log-in/out immediate ! ]]]]]]]]
other $ ssh amd64
Connection refused.

Sometimes, it crashes just after the kernel is loaded : (before the [ OK ] that all go [ Failed ] )
Sometimes, it lasts longer.

I installed the "Magic Keys", and the system still responds to commands when it "freezes".

What can I do to help you discovering what's behind that ?

Reproducible: Sometimes

Steps to Reproduce:
1. wait for X server to be up (xdm)
2. launch firefox (or k3b, or xdtv, launch anything that needs disk access)
3. move the mouse : freezed.



Expected Results:  
mouse move :-)

I used the testmem86 on the gentoo CD, 3 passes OK. (for about 3 hours).
e2fsck on my disks are OK too.
swapoff my swap device : same happens.

Nothing revelant appears in /var/log/messages
sometimes it freezes even when I build the kernel !
Comment 1 Charles de Noyelle 2007-02-14 16:56:41 UTC
Portage 2.1.2-r9 (default-linux/amd64/2006.1, gcc-4.1.1, glibc-2.5-r0, 2.6.20 x86_64)
=================================================================
System uname: 2.6.20 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System release 1.12.8
Timestamp of tree: Mon, 12 Feb 2007 16:59:01 +0000
dev-java/java-config: 1.3.7, 2.0.31
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r1
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=athlon64 -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-march=athlon64 -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://mir.zyrianes.net/gentoo/ ftp://linux.rz.ruhr-uni-bochum.de/gentoo-mirror/"
LINGUAS="fr"
MAKEOPTS=""
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X Xaw3d a52 aac aalib acpi alsa amd64 apm audiofile avi bzip2 cddb cdparanoia cdr cracklib crypt cups curl dga directfb dri dts dv dvb dvd dvdr dvdread encode esd exif extras fame fbcon ffmpeg flac gd ggi gif gphoto2 gpm gtk gtk2 i8x0 iconv icq imagemagick imap imlib insecure-patches jabber jack java jpeg jpeg2k lirc lzo mad matroska mikmod mime mjpeg motif mozcalendar mp3 mp4 mpeg mplayer msn musepack nas ncurses new-login nls nptl nptlonly nsplugin offensive ogg openal pam pcre pdf perl png python qt quicktime rar readline sdl sftplogging spell sqlite3 ssl subtitles svg tcltk tetex theora tiff tk truetype truetype-fonts type1-fonts unicode usb v4l v4l2 vcd vorbis wifi wma wmf x264 xanim xchatdccserver xine xinerama xml2 xorg xpm xscreensaver xv xvid xvmc yahoo zlib" ALSA_CARDS="emu11k1" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="fr" LIRC_DEVICES="pctv" USERLAND="GNU" VIDEO_CARDS="apm ark ati chips cirrus cyrix dummy fbdev glint i128 i810 mga neomagic nv rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2007-02-14 19:02:38 UTC
Sounds like a hardware issue. Try running memtest for 24 hours.
Comment 3 Charles de Noyelle 2007-02-15 11:14:13 UTC
(In reply to comment #2)
> Sounds like a hardware issue. Try running memtest for 24 hours.
> 

13 hours (still running) : 29 Pass, 0 error
it tells me "ECC Off" "Cache On".

The first Hangs used to happen When I used "kino" (film maker) that hang system 100% time when a movie was loaded. (it converts the movie, and then, system hang).

When I was on console (No X at all), I saw SATA { ReadError ...} and Waiting... and retrying, but never came back. (the HDD LED was fixed "on")

I then bought a new disk, and even if the "old" one is unmounted, the system still hangs, so maybe is it not hardware related but SATA driver in kernel related ?

How can I tell (with my Magic buttons) wether it is kernel or other related ? I do not think it is harddrive problem since the Alt+PrntScr+P actually Writes in /var/log/messages...

Any idea ?

Comment 4 Charles de Noyelle 2007-02-20 23:25:13 UTC
Seems NOT an hardware problem.(after 24 hours, still no error on gentoo liveCD memtest86). It MIGHT be a kernel/module in kernel problem. (imho)

How can I figure out wether it is or not ?

I'm having the "freeze" and it seems that it just freezed the "screen". I managed to kill X (I had to kill, remove /tmp/X...lock and restart a new one before screen "stopped freezing" (with an SSH access that worked)). This problem is so weird. Maybe is it my video card (Matrox P750) using unofficial Matrox drivers (official ones just don't work).

I also had an "emerge" that crashed reporting :
"Bug is not reproducible, might be an hardware problem". (it was wine. after a reboot, it just compiled, no error. )

can kernel problems make such message occur ?

Comment 5 Daniel Drake (RETIRED) gentoo-dev 2007-02-21 03:32:31 UTC
Yes, but it is very unusual given how random your crashes are. While memtest does find a large proportion of hardware issues, there are always a handful that it doesn't find.
Comment 6 Charles de Noyelle 2007-03-06 23:05:43 UTC
I removed everything unuseful on my computer 

PCTV bt848 card, that used to make my PC crash quite often when using xdtv or xawtv (mouse hangs, Ctrl+Alt+F1 does not respond ; and often, the picture on it still moves).

rt2500 Wireless device ; that used to make crash (system freeze) even when used in "concole" mode.

The only device left is my Matrox "mtx" card with "unofficial driver".

To make system crash, last time I reboot (1 hour ago) I just :
boot
login (as user)
startx
launch firefox

I use enlightenment DR17 which is under heavy dev ; but I think it is not related since it would just kill X ; and not make the whole system freeze !

Sometimes, When using Sys Magic Keys (Alt+Prt_screen+"seiub") on the very next reboot, when the kernel is just lauched and udev is lauched "segentation faults" and "file not found" occur.
sometimes it finiches with :
(none) Login :
and nothing works (Ctrl-Alt+Suppr says "Going to runlevel 6" and "no process left in runlevel")

sometimes it ends up with :
"Spurious ACK on is&0060/serio0 some program might be trying to access hardware directly" repeated infinitly.

sometimes it says "Enter root password or Ctrl-D" ; but since in that case the /dev/sda1 (/) is not mounted, it just can't know my root password :)

I tried many things in the kernel ; even mcelog. But the "mce" cron does not print anything in /var/log/mcelog.

Any idea ?
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2007-03-07 00:52:59 UTC
still sounds incredibly random and this almost always points to hardware. Have you examined any temperatures?
Comment 8 Charles de Noyelle 2007-03-07 01:36:55 UTC
You've got a point !

I was testing another video card, when I discovered that the video card's fan was not properly connected.

It *might* be the point. I'll look further in that way ASAP. (the problem still occurs every time I launch "kino" that definitly is my test to see wether it is fixed or not !)

(I switched to "needinfo" because it definitly is the case !)
Comment 9 Charles de Noyelle 2007-03-07 10:36:14 UTC
Ok, my first AGP card (matrox) must have burned.
My OLD nvidia still makes a crash (different kind, black screen)

But, my *very* old PCI card just works fine. 

Thanx everyone !
(FIXED : not a bug.)
Comment 10 Charles de Noyelle 2007-03-07 12:51:04 UTC
Well... After a few hours of use, it crashed :'(

It crashes slower with this video card.

I'm going to look Xsensors to detect material failures...

(BTW, I deactivated the "experimental" things in kernel... It did not help)
Comment 11 Charles de Noyelle 2007-03-07 14:01:22 UTC
Well : things get worse.

I have 4 video cards here, and the four of the freeze kind of the same way ; that is not good !

So, I took my gentoo LiveCD, and ran :
# gentoo root=/dev/sda1
(...)
and everything works fine.

I remembered the "noinitrd" option, and this time :

# gentoo root=/dev/sda1 noinitrd
Everythings works perfectly till udev restarts its events where it says :
   udevd-event[6231] : run_program: '/sbin/udev_run_hotplug' abnormal exit
about 60 times
(with sometimes udev_run_devd instead of run_hotplug)

and in the middle, I got :
/lib/runscripts/addoncs/udev-start.sh : line 50: 6250 Segmentation fault  sleep 0.1


This seems quite bad.
I have udev 104-r11 ; could that be linked to http://bugs.gentoo.org/show_bug.cgi?id=158861 ?

Comment 12 Daniel Drake (RETIRED) gentoo-dev 2007-03-07 15:12:56 UTC
That's still very random. The message indicates that the /bin/sleep program crashed when a udev script ran "sleep 0.1". It's not a udev issue, if anything it indicates a coreutils issue, but sleep is such a simple and fundamental program I'd be amazed if it had a bug causing it to segfault.
Comment 13 Charles de Noyelle 2007-03-07 16:13:46 UTC
I was quite amazed too (as I supposed that it was sleep that crashed). It must indicate an hardware issue !

BUT

As I was on my mounted /mnt/gentoo (I used my liveCD because normal boot did not work anymore) ; I compiled my kernel and emerge -e system.

After a while, it crashed (emerge crashed randomly, as if there where RAM errors)

I reboot, without liveCD, now, seems to work (emerge --resume to finish the emerge -e system still does not crash)

I'm trying to get sensors working... 
Comment 14 Charles de Noyelle 2007-03-07 16:23:10 UTC
This crash is amazing : (could that be libc/glib related ?)
It finally crashed during the emerge gcc (in emerge-e world)

And I got kind of :
lib".so.6 : file not found

(yes with " ! )

The first error message appeared when lauching dhcpcd (so quite late in boot sequence) ; and evently occured with a "kernel oops" (not a kernel pannic) ; and my 
(spontex64) Login:

I then halted system down, wait a few minuts (M$ habit), and booted again. Nice boot, this time : I'm building a new kernel with sensors drivers !
Comment 15 Charles de Noyelle 2007-03-07 16:51:48 UTC
could that be a CFLAGS march problem ?
Comment 16 Daniel Drake (RETIRED) gentoo-dev 2007-03-07 18:41:02 UTC
The more you write about your problems, the more random things seem to get. Take a quick read back up this bug and look at the vast number of different failures. There isn't a single software component that stands out which you can pinpoint for these issues.

This DOES *really* sound like a hardware issue and until you are able to rule out this I would stop looking at the software side of things. Based on the information you have written here, the only component you have ruled out is the video card.
Comment 17 Charles de Noyelle 2007-03-07 20:03:45 UTC
I agree with the hardware "bug" ; but the video card is the only device still plugged in ! (and it seems that the bug is somewhere else since I tested 4 different video cards)  I *think* it can not be a hard-drive problem : it crashes when I boot on CD-ROM.  How could I figure out a processor or mother board failure ? Would that be *that* random ? 
Comment 18 Daniel Drake (RETIRED) gentoo-dev 2007-03-07 21:14:32 UTC
Yes, this behaviour almost always comes from mobo/CPU/RAM and rarely anything else. I'd say RAM is probably the likely culprit here but obviously thats just a guess and I can't say for sure.
Comment 19 Charles de Noyelle 2007-03-08 00:27:28 UTC
It seems to be an hard-drive problem (which is strange since badblocks with safe RW tests did not discover anything).

Thanx for your time : gentoo definitly does not bug with me. Sorry for the "not a bug" thread.