I have 3 boxes who are using VIA C7 CPU, all of them were running cery well for more than 1 year. I recently emerge --sync and emerge -DNatuv world on 2 of the boxes. The 2 upgraded boxes ( now running glibc-2.6.1 )are now now freezing very often, mostly when heavy load ( >2 load ) There is nothing in the logs, I checked syslog, kern.log . . . nothing, no clue. the other box, still running glibc-2.5-r4 , is working very well as before. I tried many kernel, from 2.6.18 to 2.6.24, with and without hardened profile, its the same I'm not the only one, many people in france had this problem, debian users who downgraded libc could go back to a stable server, but with gentoo, downgrading libc seems pretty dangerous. All the reported problems are in french cause it seems only dedibox ( http://dedibox.fr ) provides low cost servers using VIA C7 processor ), if needed i can provide many webpages where people describe the problem in french ( google "dedibox freeze libc" gives some ), but I found nothing in english, if you contact dedibox.fr admins they will confirm the problem, perhaps they even could accept to provide a box for testing, who knows . . . I have nothing to give you, nothing in the log, the box just stop working as if power had been switched off The problem happened at least on debian and gentoo which are the most used linux distros on dedibox. The exact processor is : processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 10 model name : VIA Esther processor 2000MHz stepping : 9 cpu MHz : 1995.084 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm pni est tm2 rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en bogomips : 3994.49 clflush size : 64 Reproducible: Always Steps to Reproduce: 1. box using VIA C7 processor with glibc 2. heavy load >2 3. wait 1 or 2 hours Actual Results: complete freeze, nothing in the log, hardware reboot necessary ( soft reboot with ctrl+alt+suppr dont work ) Expected Results: stable box, as with glibc-2.5-r4 ( or an easy way to downgrade libc to 2.5-r4 ;)
Created attachment 157533 [details] kernel .config for the VIA C7 box I add the kernel .config for thos VIA C7 boxes which are freezing since glibc-2.6.1 could this be useful . . .
Portage 2.1.4.4 (hardened/x86/2.6, gcc-4.1.1, glibc-2.6.1-r0, 2.6.24-gentoo-r8dedibox-r8 i686) ================================================================= System uname: 2.6.24-gentoo-r8dedibox-r8 i686 VIA Esther processor 2000MHz Timestamp of tree: Fri, 13 Jun 2008 12:18:01 +0000 app-shells/bash: 3.2_p33 dev-lang/python: 2.4.4-r13 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=i686 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O2 -march=i686 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks fixpackages metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox" GENTOO_MIRRORS="http://gentoo.modulix.net/gentoo/ http://ftp.club-internet.fr/pub/mirrors/gentoo ftp://ftp.dedibox.fr/gentoo/ ftp://ftp.dedibox.fr/gentoo/ " MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X apache2 bcmath berkdb calendar cddb cgi cli cracklib crypt ctype cups exif flash force-cgi-redirect ftp gd hardened hardenedphp hash iconv imap jpeg ldap maildir mcal memlimit midi mysql nls nptl nptlonly openssh pam pcntl pcre pdf pic png python readline sasl sdl session simplexml soap sockets spamassassin ssl sysvipc tcpd tidy tiff tokenizer truetype unicode urandom vhosts x86 xml xmlreader xmlrpc xmlwriter zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="mouse keyboard" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i740 i810 imstt mach64 mga neomagic nsc nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga vmware voodoo" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
another detail, I installed lm_sensors to invetigate the problem, and I'm sure this is not a temperature problem, the max temperature I seen with lm_sensors was 54°C , which seems pretty normal for a VIA C7 . . . and another link I just found : http://lkml.org/lkml/2007/6/12/116 could this be related ?
kernel freezing -> kernel problem, not toolchain
(In reply to comment #4) > kernel freezing -> kernel problem, not toolchain no, the problem is the same with all the kernels I tried, from 2.6.18 to to 2.6.24 the same kernels with older libc on the same boxes dont freeze, ans all the debian guys who had the problem seen the problem disappear after downgrading the libc. so I'm nearly sure this is not a kernel problem, and iirc the kernel build have its own libc ? the kernel dont link the libc nop ?
(In reply to comment #5) > so I'm nearly sure this is not a kernel problem, and iirc the kernel build > have its own libc ? the kernel dont link the libc nop ? If the kernel completely locks up then that is a kernel bug or a hardware bug. It shouldn't be possible to lock up the kernel, regardless of what userland such as glibc does. Presumably the newer glibc is doing something different that is triggering the bug. Regardless of what that is and whether it should be doing it, it shouldn't completely hang the kernel. Anyway, on to suggestions. Here are a few things to try. This forum, and others, suggest disabling power saving: http://ubuntuforums.org/archive/index.php/t-79395.html Does that fix things? Have you tried setting up a serial console (ideally) or netconsole? It would be great if you could. Sometimes you can see messages on them that don't make it to logs on disk. Does the box respond to Magic Sys-Rq after it hangs? If you haven't tried already please ensure that support is compiled in and try using it to get a stack dump (with Alt-Ctrl-SysRq-t) after it hangs. Have you tried setting up a hardware watchdog? If your board includes one you may be able to get something useful out of the system with it (setup a serial console or netconsole first). More info at these links: http://gentoo-wiki.com/HOWTO_Watchdog_Timer http://www.tkarena.com/forums/linux-arena/37136-howto-hardware-watchdog-via-en12000eg.html Let us know how you get on with this lot.
these boxe are dedicated servers > This forum, and others, suggest disabling power saving: > http://ubuntuforums.org/archive/index.php/t-79395.html > Does that fix things? no acpi on the boxes > Have you tried setting up a serial console (ideally) or netconsole? It would be > great if you could. Sometimes you can see messages on them that don't make it > to logs on disk. I cant those are dedicated servers > Does the box respond to Magic Sys-Rq after it hangs? If you haven't tried > already please ensure that support is compiled in and try using it to get a > stack dump (with Alt-Ctrl-SysRq-t) after it hangs. I cant test this either ( dedicated servers, ssh only ) > Have you tried setting up a hardware watchdog? If your board includes one you > may be able to get something useful out of the system with it (setup a serial > console or netconsole first). More info at these links: > http://gentoo-wiki.com/HOWTO_Watchdog_Timer > http://www.tkarena.com/forums/linux-arena/37136-howto-hardware-watchdog-via-en12000eg.html yes i have setup watchdog( w83697hf_wdt and w83627hf modules ) after this problem appeared, even the watchdog gives no log, but the automatic reboot works
I create an attachment with the logs I see at the time the box freeze and the watchdog reboot
Created attachment 158173 [details] logs at the freeze/ watchdog reboot time I dont think this can be useful but . . . the only bizarre thing is the time shift
I suggest posting to LKML. Include details of the hardware, and the kernel and glibc versions you've tried. Explain you've not got any useful logs and can't access the machines to setup a serial console or try Magic-SysRq, but that the hardware watchdog does reboot the machine. You should probably also include a link to this bug report. Hopefully it will ring a bell for someone there.
mail posted on the LKML
(In reply to comment #11) > mail posted on the LKML foolowup on the LKML : http://lkml.org/lkml/2008/6/23/496
some trouble on PIII coppermine. Symptoms: cpu loaded on 100% if some soft worked(like udevd, syslog-ng, etc). But sometime cpu with this soft idle. Begins after i sync on Jun 30: # genlop glibc ... Sat Jul 7 16:15:51 2007 >>> sys-libs/glibc-2.5-r4 Thu Oct 18 08:03:23 2007 >>> sys-libs/glibc-2.6.1 Thu Oct 18 14:29:44 2007 >>> sys-libs/glibc-2.6.1 Thu Oct 18 18:09:02 2007 >>> sys-libs/glibc-2.6.1 Sat Nov 17 10:38:35 2007 >>> sys-libs/glibc-2.6.1 Sat Nov 17 13:10:31 2007 >>> sys-libs/glibc-2.6.1 Sat Nov 17 22:47:25 2007 >>> sys-libs/glibc-2.6.1 Fri Nov 23 02:45:28 2007 >>> sys-libs/glibc-2.6.1 Mon Jun 30 21:09:19 2008 >>> sys-libs/glibc-2.6.1 elogv: # diff sys-libs\:glibc-2.6.1\:20080630-170907.log sys-libs\:glibc-2.6.1\:20071122-234515.log 16c16 < Applying Gentoo Glibc Patchset 2.6.1-1.2 ... --- > Applying Gentoo Glibc Patchset 2.6.1-1.1 ... 28d27 < 1065_all_glibc-2.7-nscd-paranoia-segv.patch ... Recompile glibc without 1065_all_glibc-2.7-nscd-paranoia-segv.patch - not solved problem. # strace -p `pidof /sbin/udevd` -cf % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 31.24 0.005427 2 2795 2446 rmdir 21.96 0.003814 3 1398 699 waitpid 12.12 0.002106 3 699 clone 11.58 0.002011 2 1049 700 lstat64 4.86 0.000845 0 1748 open 2.70 0.000469 0 5592 close 2.26 0.000393 0 1398 write 2.25 0.000390 0 3494 1748 unlink 1.83 0.000318 0 1398 read 1.82 0.000316 0 2797 699 stat64 1.23 0.000213 1 349 socket 1.14 0.000198 0 699 recv 0.97 0.000168 0 349 readlink 0.83 0.000144 0 699 munmap 0.71 0.000123 0 699 chmod 0.69 0.000120 0 350 mknod 0.58 0.000101 0 699 chown32 0.55 0.000095 0 699 mmap2 0.46 0.000080 0 699 sigreturn 0.23 0.000040 0 699 select 0.00 0.000000 0 2097 time 0.00 0.000000 0 699 alarm 0.00 0.000000 0 350 mkdir 0.00 0.000000 0 1049 symlink 0.00 0.000000 0 699 setpriority 0.00 0.000000 0 3495 rt_sigaction 0.00 0.000000 0 699 fstat64 0.00 0.000000 0 349 349 sendto ------ ----------- ----------- --------- --------- ---------------- 100.00 0.017371 37746 6641 total any idea? p.s. excuse me for my english.
William, were you able to try Alan's suggestions? (In reply to comment #13) > some trouble on PIII coppermine. Symptoms: cpu loaded on 100% if some soft > worked(like udevd, syslog-ng, etc). But sometime cpu with this soft idle. I think your problem is unrelated, sorry. This one seems specific to VIA C7 systems.
I have the same problem. > Does the box respond to Magic Sys-Rq after it hangs? If you haven't tried > already please ensure that support is compiled in and try using it to get a > stack dump (with Alt-Ctrl-SysRq-t) after it hangs. I tried that, and it did nothing visible. I was in X11 when it happened, does that make the stack dump not work? (The description says it dumps stuff to the *console*.) I use a Via C7 in my home desktop, so I can try some things to provide information. (I work from home using my computer, so I don't want to try *really* risky things.)
(In reply to comment #15) > I tried that, and it did nothing visible. I was in X11 when it happened, does > that make the stack dump not work? Pretty much, yeah. You should be able to use it via a serial console while in X11 though, if you are able to set one up. > I use a Via C7 in my home desktop, so I can try some things to provide > information. (I work from home using my computer, so I don't want to try > *really* risky things.) Excellent, it might help to try Alan Cox's suggestions here: http://lkml.org/lkml/2008/6/24/424 I don't *think* they should do anything worse than crashing your machine (which is happening anyway). However I'd make sure my backups were up-to-date before trying crashme.
Has anyone managed to resolve this issue. I am seing a simliar issue on my home server which is based on the EPIA-SP motherboard. I have started getting random freezes during running. At present I can only achieve hours of runtime. Sometimes only minutes before it locks up hard and needs a reboot.
I have also a VIA C7 CPU (those everex Wallmart cheap PC's) wolfgang ~ # cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 13 model name : VIA C7-D Processor 1500MHz stepping : 0 cpu MHz : 1500.044 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx up pni xtpr rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en bogomips : 3002.36 clflush size : 64 and it was running stable with it's supplied gOS. Since my server died of old age (p2 450) I simply put the disk in this box, recompiled it all with the safe flags (but with -Os) and saw uptimes perish to not be able to stay up for more then 2 days. At the moment the box runs glibc-2.6.1 but emerge -e World and is 12 days up. I DID change the -Os back to -O2. Since you actually are on -O2 see this post as an archival post to assist others who may have the same issue.
I'm seeing the same on a fresh Gentoo install on a Via VB7001G mo-bo -- Via C7-D processor, 512MB RAM, 320GB disk. I can emerge 30 or 40 packages, then it hangs. As with the other users, everything was fine before I emerged glibc-2.6.1, and the machine stopped working when I emerged glibc-2.6.1. I was at glibc-2.5-r4 prior to the upgrade. I'd suggest an x86 mask at least until this can be confined the C7-D processor.
Next steps: 1. Try Alan Cox's suggestions http://lkml.org/lkml/2008/6/24/424 2. (for the adventurous) somehow get a set of changes between glibc-2.5 and glibc-2.6.1 (there probably aren't that many) and look for obvious candidates
I've done the echo statements he recommended, and will let you know if I get any out of memory kills. I don't know how to use crashme, or where to get it. Searching for programs on the web to crash my computer seems like a bad idea when I don't know which one I want. (Searches turn up at least 3 different programs called crashme.) Can anyone point me at the correct crashme?
Here is the crashme to use: http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/crashme/crashme-2.4-shar The README has some usage examples. Try running it for a long time - overnight or something. Let us know if more guidance is needed.
Created attachment 174304 [details, diff] Fix sa_mask initialization in crashme In order to get the crashme to compile, I needed to make a minor fix to the sigaction initialization; patch attached.
(In reply to comment #23) > Created an attachment (id=174304) [edit] > Fix sa_mask initialization in crashme > > In order to get the crashme to compile, I needed to make a minor fix to the > sigaction initialization; patch attached. 'stress' is also a great tool for "exciting" bugs like this one :)
I think stress is less relevant here. stress puts load on the system by repeating the same simple operations, whereas the point of crashme is to try and execute all the machine code under the sun to find something that messes up the kernel or hardware.
(In reply to comment #20) > Next steps: > 1. Try Alan Cox's suggestions > http://lkml.org/lkml/2008/6/24/424 Finally, i couldnt follow thes suggestions, were missing time and needing to setup fast another server for prod. > 2. (for the adventurous) somehow get a set of changes between glibc-2.5 and > glibc-2.6.1 (there probably aren't that many) and look for obvious candidates Although I'm a C programmer, I don't think i'm able to find it myself, but I wanted to see what could be this diff, and provide it to add data to this bug report . . . So I setup a diff between glibc-2.5-20081013.tar.bz2 and glibc-2.6-20081215.tar.bz2 ( I just hope I chose the good tarballs . . . since version numbers are differents in each distro . . . ) The complete diff, files by files, unified, colored . . . can be found here : http://trac.ww7.be/trac.ww7.be/log/trunk/libc?action=stop_on_copy&rev=207&stop_rev=&mode=stop_on_copy&verbose=on ( check "view changes" to see the list, then you can see diffs file by file, or download the complete diff ) I'll try to have a look, and hope I won't be alone doing this . . .
The diff would be a lot more approachable if it had the CVS stuff excluded
(In reply to comment #27) > The diff would be a lot more approachable if it had the CVS stuff excluded yes I seen this a bit later, and removed it, the new diff, excluding CVS removed files is here : http://trac.ww7.be/trac.ww7.be/changeset?new=trunk%2Flibc%40210&old=trunk%2Flibc%40206
It's bigger than I thought it would be, and nothing jumps out with a quick skim. Any progress testing the items that Alan Cox suggested?
So, do we have any news, William?
Seems to be related? As a user of a Dedibox V1 with a Via C7 CPU, I've reported a bug [1] on Debian 4.0, resolved it by an libc6 unstable or downgraded. Then moved to a Ubuntu 9.04, and reported the bug [2], freezing a lot without a thing in the logs. If it can help. [1] https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/375315 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507845
I've read the libc6 contains the locales, can't it be a mal configured locales variables ? As I'm a french user, seems the others are french, and in my case I've modified the locales to "fr_FR.UTF-8", just after the first install, perhaps that's that who put the mess in the box. I'm trying this trail on fresh install.
I'm in the States and don't use non-English locales and see the bug.