Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 231624 - sys-kernel/gentoo-sources-2.6.25-r6: soft lockup on phenom while compiling glibc
Summary: sys-kernel/gentoo-sources-2.6.25-r6: soft lockup on phenom while compiling glibc
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-12 19:34 UTC by Marcin Deranek
Modified: 2008-10-17 11:43 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel config (config,35.71 KB, text/plain)
2008-07-12 19:37 UTC, Marcin Deranek
Details
CPU info (cpuinfo.txt,3.03 KB, text/plain)
2008-07-12 21:51 UTC, Marcin Deranek
Details
lspci (lspci.txt,2.36 KB, text/plain)
2008-07-12 21:51 UTC, Marcin Deranek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Deranek 2008-07-12 19:34:22 UTC
After switching from Athlon x2 to Phenom I started experiencing soft lockups. Seems like problem appears under high load conditions. I can always reproduce the problem while compiling glibc-2.8_p20080602. When I set maxcpus=2 problem does not appear anymore - anything above 2 does not work.


Reproducible: Always

Steps to Reproduce:
1. Phenom CPU
2. MAKEOPTS="-j5"
3. emerge -1 glibc

Actual Results:  
At certain point (random) system becomes 'irresponsive' - all running applications still operate for short period of after soft lock, but spawning any new one is not possible. When running top I can see either that events/3 kernel thread is consuming 100% or one of the cpus shows 0% activity (in all categories like system, user, idle, wait etc.).

Expected Results:  
System should operate normally.

sun ~ # emerge --info
Portage 2.2_rc1 (default/linux/amd64/2008.0/desktop, gcc-4.3.1, glibc-2.8_p20080602-r0, 2.6.25-gentoo-r6 x86_64)
=================================================================
System uname: Linux-2.6.25-gentoo-r6-x86_64-AMD_Phenom-tm-_9850_Quad-Core_Processor-with-glibc2.2.5
Timestamp of tree: Fri, 11 Jul 2008 17:15:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p39
dev-lang/python:     2.5.2-r5
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 2.0.0
sys-apps/openrc:     0.2.5
sys-apps/sandbox:    1.2.18.1-r3
sys-devel/autoconf:  2.13, 2.62-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   2.2.4
virtual/os-headers:  2.6.25-r4
ACCEPT_KEYWORDS="amd64 ~amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=amdfam10 -ftree-vectorize -fvect-cost-model -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/splash /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-O2 -march=amdfam10 -ftree-vectorize -fvect-cost-model -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks parallel-fetch preserve-libs sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://ftp.snt.utwente.nl/pub/os/linux/gentoo"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,now"
LINGUAS="en pl"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X a52 aac aalib accessibility acpi adns alsa amd64 ao apm audiofile avahi bash-completion bcmath berkdb bidi bluetooth branding bzip2 cairo caps cddb cdparanoia cdr clamav cli cracklib crypt cscope ctype cups curl curlwrappers dbus dga dri dts dv dvb dvd dvdr dvdread encode exif expat fbcon ffmpeg fftw firefox flac flatfile foomaticdb ftp gb gd gdbm ggi gif ginac glut gmp gnome gnome-keyring gnutls gphoto2 gpm graphviz gstreamer gtk gtkhtml guile hal iconv idn imagemagick imlib isdnlog javascript jbig jikes jpeg jpeg2k kdehiddenvisibility lcms ldap lesstif libcaca libedit libgda libnotify libsamplerate libwww lm_sensors m17n-lib mad maildir matroska mbox mcal memlimit mhash midi mikmod mime mmap mmx mmxext mng mp3 mpeg mpi mplayer mudflap multilib musepack ncurses nntp nocd nptl nptlonly nsplugin offensive ogg openal openexr opengl openmp osc oss pam pcntl pcre pda pdf pic plotutils png posix ppds pppd profile qt3support quicktime rdesktop readline recode reflection sdl session sharedmem shorten simplexml slang slp sndfile snmp sockets sox speex spell sse sse2 ssl svg sysvipc szip tcpd test theora threads tidy tiff truetype unicode usb v4l v4l2 vcd videos vim-syntax vorbis wmf wxwindows x264 xcomposite xine xinerama xml xmlrpc xorg xosd xpm xscreensaver xsl xulrunner xv xvid yaz zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="canon" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en pl" USERLAND="GNU" VIDEO_CARDS="fglrx v4l"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

kernel log:

[ 1323.925946] BUG: soft lockup - CPU#3 stuck for 61s! [events/3:18]
[ 1323.925946] CPU 3:

[ 1323.925946] Modules linked in:
netconsole
nls_utf8
ntfs
nls_base
snd_usb_audio
snd_usb_lib
snd_cmipci
snd_pcm
snd_page_alloc
snd_opl3_lib
snd_timer
snd_hwdep
snd_mpu401_uart
snd_rawmidi
snd_seq_device
snd
i2c_nforce2
i2c_core
forcedeth
floppy
ehci_hcd
rtc
ohci_hcd
soundcore
sg
sr_mod
cdrom

[ 1323.925946] Pid: 18, comm: events/3 Not tainted 2.6.25-gentoo-r6 #9
[ 1323.925946] RIP: 0010:[<ffffffff8021a46b>] 
[<ffffffff8021a46b>] __smp_call_function_mask+0xab/0xcd
[ 1323.925946] RSP: 0018:ffff81011feddd60  EFLAGS: 00000297
[ 1323.925946] RAX: ffffffff8022e5ac RBX: ffff81011fedddc0 RCX: 0000000000000003
[ 1323.925946] RDX: 0000000000000008 RSI: ffff81011fedae78 RDI: ffff81011fedaea0
[ 1323.925946] RBP: 0000000000000003 R08: 0000000000000000 R09: d37a6f4de9bd37a7
[ 1323.925946] R10: ffff81011feddde0 R11: 0000000000000046 R12: 00000000000001c5
[ 1323.925946] R13: ffff810080b28000 R14: ffff81011fedc000 R15: 0000000000000001
[ 1323.925946] FS:  00007fab7235e700(0000) GS:ffff81011fe656d0(0000) knlGS:00000000f7e40a10
[ 1323.925946] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 1323.925946] CR2: 00002ab5d9efe5e4 CR3: 00000001170f5000 CR4: 00000000000006e0
[ 1323.925946] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1323.925946] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1323.925946] 
[ 1323.925946] Call Trace:
[ 1323.925946]  [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a
[ 1323.925946]  [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a
[ 1323.925946]  [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a
[ 1323.925946]  [<ffffffff8021a54a>] ? smp_call_function_mask+0x4a/0x63
[ 1323.925946]  [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a
[ 1323.925946]  [<ffffffff8021a57c>] ? smp_call_function+0x19/0x1b
[ 1323.925946]  [<ffffffff802364af>] ? on_each_cpu+0x18/0x36
[ 1323.925946]  [<ffffffff80241031>] ? run_workqueue+0xb1/0x203
[ 1323.925946]  [<ffffffff80215360>] ? mcheck_timer+0x21/0x80
[ 1323.925946]  [<ffffffff8024107c>] ? run_workqueue+0xfc/0x203
[ 1323.925946]  [<ffffffff8021533f>] ? mcheck_timer+0x0/0x80
[ 1323.925946]  [<ffffffff80241223>] ? worker_thread+0xa0/0xb1
[ 1323.925946]  [<ffffffff80244ac5>] ? autoremove_wake_function+0x0/0x38
[ 1323.925946]  [<ffffffff80241183>] ? worker_thread+0x0/0xb1
[ 1323.925946]  [<ffffffff802447c1>] ? kthread+0x49/0x76
[ 1323.925946]  [<ffffffff8020c108>] ? child_rip+0xa/0x12
[ 1323.925946]  [<ffffffff8020b81f>] ? restore_args+0x0/0x30
[ 1323.925946]  [<ffffffff80244778>] ? kthread+0x0/0x76
[ 1323.925946]  [<ffffffff8020c0fe>] ? child_rip+0x0/0x12
[ 1323.925946] 
[ 1345.641958] SysRq : 
Show Locks Held
[ 1345.642035] 
[ 1345.642037] Showing all locks held in the system:
[ 1345.642118] 7 locks held by events/3/18:
[ 1345.642158]  #0: 
(events
){--..}
, at: 
[<ffffffff80241031>]
run_workqueue+0xb1/0x203
[ 1345.642376]  #1: 
((mcheck_work).work
){--..}
, at: 
[<ffffffff80241031>]
run_workqueue+0xb1/0x203
[ 1345.642596]  #2: 
(call_lock
){--..}
, at: 
[<ffffffff8021a53a>]
smp_call_function_mask+0x3a/0x63
[ 1345.642833]  #3: 
(&dev->event_lock
){++..}
, at: 
[<ffffffff803e2b90>]
input_event+0x3b/0x77
[ 1345.643072]  #4: 
(rcu_read_lock
){..--}
, at: 
[<ffffffff803e1579>]
input_pass_event+0x0/0xe1
[ 1345.643292]  #5: 
(sysrq_key_table_lock
){+...}
, at: 
[<ffffffff80368a43>]
__handle_sysrq+0x26/0x158
[ 1345.643303]  #6: 
(tasklist_lock
){..--}
, at: 
[<ffffffff8024fc10>]
debug_show_all_locks+0x4d/0x17f
[ 1345.643303] 1 lock held by agetty/2963:
[ 1345.643303]  #0: 
(&tty->atomic_read_lock
){--..}
, at: 
[<ffffffff8035c829>]
read_chan+0x2ab/0x760
[ 1345.643303] 1 lock held by agetty/2965:
[ 1345.643303]  #0: 
(&tty->atomic_read_lock
){--..}
, at: 
[<ffffffff8035c829>]
read_chan+0x2ab/0x760
[ 1345.643303] 1 lock held by agetty/2966:
[ 1345.643303]  #0: 
(&tty->atomic_read_lock
){--..}
, at: 
[<ffffffff8035c829>]
read_chan+0x2ab/0x760
[ 1345.643303] 1 lock held by agetty/2967:
[ 1345.643303]  #0: 
(&tty->atomic_read_lock
){--..}
, at: 
[<ffffffff8035c829>]
read_chan+0x2ab/0x760
[ 1345.643303] 
[ 1345.643303] =============================================
Comment 1 Marcin Deranek 2008-07-12 19:37:12 UTC
Created attachment 160212 [details]
kernel config

Debugging has been done with the same config + netconsole + debugging options
Comment 2 Peter Alfredsen (RETIRED) gentoo-dev 2008-07-12 19:49:22 UTC
http://lkml.org/lkml/2008/3/8/128
Short version:
See if nmi_watchdog=1 fixes this. If so, it's probably a BIOS bug and upgrading it will solve your problem.
Comment 3 Marcin Deranek 2008-07-12 21:51:05 UTC
Although some descriptions look very similar nmi_watchdog=1 did not help in my case (I own Asus M2N32 SLI DELUXE) - system got stuck at some point during glibc compilation as usual..
Comment 4 Marcin Deranek 2008-07-12 21:51:29 UTC
Created attachment 160217 [details]
CPU info
Comment 5 Marcin Deranek 2008-07-12 21:51:54 UTC
Created attachment 160219 [details]
lspci
Comment 6 Mike Pagano gentoo-dev 2008-07-14 13:55:26 UTC
Is your bios up to date?

http://support.asus.com/download/download_item.aspx?product=1&model=M2N32-SLI%20Deluxe

Comment 7 Marcin Deranek 2008-07-14 14:05:16 UTC
Yes, I'm running the latest BIOS 2001.
Comment 8 Marcin Deranek 2008-07-17 05:28:24 UTC
I have tried vanilla kernel 2.6.26 with the same result. I have also tried other BIOS revision (first which supports Phenoms) with no luck either.
When enabled nmi_watchdog=1 NMI counter does not increase, so I tried nmi_watchdog=2, but that one did not help either although NMI counter was increasing.
Comment 9 Marcin Deranek 2008-08-15 20:40:27 UTC
For the last a couple of weeks I have been trying different setups (different gcc, kernel, glibc etc.), but none of them worked in the end. The worst part is that those lockups are random - sometimes they happen just a couple of minutes after computer starts.. Although I have already replaced PSU, chasis (better cooling) etc. they still persist. Initially I have compared that with WinXP 32-bit which did not have that. Recently I have installed WinXP x64 which has this problem as well. This would indicate that this a hardware problem - I suspect that it has something to do with USB 2.0 subsystem (USB 1.1 works fine) and possibly broken BIOS. In such case I'll close this bug as problem most probably lies somewhere else.. and sorry for bothering :-)
Comment 10 Matteo Frigo 2008-08-28 00:04:35 UTC
Hi all.

I am using debian/testing, but I have been experiencing the same lockups
described here.  I have been able to avoid the lockups (so far) by running
2.6.27-rc4 with the notsc boot flag.

My current working hypothesis is that the problem is related to AMD erratum
280: ``Time Stamp Counter May Yield an Incorrect Value''.  It seems that
the time stamp counter on Phenom occasionally (once/day?) returns a bogus value.
This problem confuses the softlockup detector, which incorrectly concludes
that a task is stuck.  It may also confuse the scheduler, but I have not
been able to prove this.

Kernel 2.6.27 accepts the notsc flag to ignore the time-stamp counter.
Note that 2.6.26 and 2.6.25 accept this flag in 32-bit mode and ignore it in 64-bit mode. I haven't looked at earlier kernels.

It seems like other people have reported similar lockups but nobody has a
solution.  If my theory turns out to be correct, the kernel should be
patched to autodetect phenoms with the erratum and disable the time-stamp
counter automatially.
Comment 11 Marcin Deranek 2008-08-28 20:16:55 UTC
Thanx for the tip - I'll give it a try and let you know about the outcome.

In the meantime I found the following post:
http://www.overclock.net/amd-general/319031-phenom-9850-system-freezing-5.html
which might explain why I did not have any problems under 32-bit Windows while 64-bit Linux and Windows were freezing.
Comment 12 Marcin Deranek 2008-09-01 12:21:02 UTC
I tried it and it doesn't seem to help as system was locking up as usual.
I saw that there was a new BIOS available on ASUS website, so I downloaded and installed it. The only visible change in the BIOS was a possibility to enable/disable AMD C1E feature (it's disabled by default). After upgrade system does not lockup that often, but it still happens (it locked up during 6th system recompilation - every round took a couple of hours to re-compile ~300 packages).
I need more time to investigate it further, but due to random nature of this problem I'm rather skeptic..
Comment 13 Marcin Deranek 2008-10-17 11:43:43 UTC
It turned out to be a faulty CPU. I got the replacement a couple of days ago and from then I do not experience any problems.