After switching from Athlon x2 to Phenom I started experiencing soft lockups. Seems like problem appears under high load conditions. I can always reproduce the problem while compiling glibc-2.8_p20080602. When I set maxcpus=2 problem does not appear anymore - anything above 2 does not work. Reproducible: Always Steps to Reproduce: 1. Phenom CPU 2. MAKEOPTS="-j5" 3. emerge -1 glibc Actual Results: At certain point (random) system becomes 'irresponsive' - all running applications still operate for short period of after soft lock, but spawning any new one is not possible. When running top I can see either that events/3 kernel thread is consuming 100% or one of the cpus shows 0% activity (in all categories like system, user, idle, wait etc.). Expected Results: System should operate normally. sun ~ # emerge --info Portage 2.2_rc1 (default/linux/amd64/2008.0/desktop, gcc-4.3.1, glibc-2.8_p20080602-r0, 2.6.25-gentoo-r6 x86_64) ================================================================= System uname: Linux-2.6.25-gentoo-r6-x86_64-AMD_Phenom-tm-_9850_Quad-Core_Processor-with-glibc2.2.5 Timestamp of tree: Fri, 11 Jul 2008 17:15:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 3.2_p39 dev-lang/python: 2.5.2-r5 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 2.0.0 sys-apps/openrc: 0.2.5 sys-apps/sandbox: 1.2.18.1-r3 sys-devel/autoconf: 2.13, 2.62-r1 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 2.2.4 virtual/os-headers: 2.6.25-r4 ACCEPT_KEYWORDS="amd64 ~amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -march=amdfam10 -ftree-vectorize -fvect-cost-model -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/splash /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O2 -march=amdfam10 -ftree-vectorize -fvect-cost-model -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks parallel-fetch preserve-libs sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://ftp.snt.utwente.nl/pub/os/linux/gentoo" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,now" LINGUAS="en pl" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="3dnow 3dnowext X a52 aac aalib accessibility acpi adns alsa amd64 ao apm audiofile avahi bash-completion bcmath berkdb bidi bluetooth branding bzip2 cairo caps cddb cdparanoia cdr clamav cli cracklib crypt cscope ctype cups curl curlwrappers dbus dga dri dts dv dvb dvd dvdr dvdread encode exif expat fbcon ffmpeg fftw firefox flac flatfile foomaticdb ftp gb gd gdbm ggi gif ginac glut gmp gnome gnome-keyring gnutls gphoto2 gpm graphviz gstreamer gtk gtkhtml guile hal iconv idn imagemagick imlib isdnlog javascript jbig jikes jpeg jpeg2k kdehiddenvisibility lcms ldap lesstif libcaca libedit libgda libnotify libsamplerate libwww lm_sensors m17n-lib mad maildir matroska mbox mcal memlimit mhash midi mikmod mime mmap mmx mmxext mng mp3 mpeg mpi mplayer mudflap multilib musepack ncurses nntp nocd nptl nptlonly nsplugin offensive ogg openal openexr opengl openmp osc oss pam pcntl pcre pda pdf pic plotutils png posix ppds pppd profile qt3support quicktime rdesktop readline recode reflection sdl session sharedmem shorten simplexml slang slp sndfile snmp sockets sox speex spell sse sse2 ssl svg sysvipc szip tcpd test theora threads tidy tiff truetype unicode usb v4l v4l2 vcd videos vim-syntax vorbis wmf wxwindows x264 xcomposite xine xinerama xml xmlrpc xorg xosd xpm xscreensaver xsl xulrunner xv xvid yaz zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="canon" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en pl" USERLAND="GNU" VIDEO_CARDS="fglrx v4l" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY kernel log: [ 1323.925946] BUG: soft lockup - CPU#3 stuck for 61s! [events/3:18] [ 1323.925946] CPU 3: [ 1323.925946] Modules linked in: netconsole nls_utf8 ntfs nls_base snd_usb_audio snd_usb_lib snd_cmipci snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd i2c_nforce2 i2c_core forcedeth floppy ehci_hcd rtc ohci_hcd soundcore sg sr_mod cdrom [ 1323.925946] Pid: 18, comm: events/3 Not tainted 2.6.25-gentoo-r6 #9 [ 1323.925946] RIP: 0010:[<ffffffff8021a46b>] [<ffffffff8021a46b>] __smp_call_function_mask+0xab/0xcd [ 1323.925946] RSP: 0018:ffff81011feddd60 EFLAGS: 00000297 [ 1323.925946] RAX: ffffffff8022e5ac RBX: ffff81011fedddc0 RCX: 0000000000000003 [ 1323.925946] RDX: 0000000000000008 RSI: ffff81011fedae78 RDI: ffff81011fedaea0 [ 1323.925946] RBP: 0000000000000003 R08: 0000000000000000 R09: d37a6f4de9bd37a7 [ 1323.925946] R10: ffff81011feddde0 R11: 0000000000000046 R12: 00000000000001c5 [ 1323.925946] R13: ffff810080b28000 R14: ffff81011fedc000 R15: 0000000000000001 [ 1323.925946] FS: 00007fab7235e700(0000) GS:ffff81011fe656d0(0000) knlGS:00000000f7e40a10 [ 1323.925946] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 1323.925946] CR2: 00002ab5d9efe5e4 CR3: 00000001170f5000 CR4: 00000000000006e0 [ 1323.925946] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1323.925946] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1323.925946] [ 1323.925946] Call Trace: [ 1323.925946] [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a [ 1323.925946] [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a [ 1323.925946] [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a [ 1323.925946] [<ffffffff8021a54a>] ? smp_call_function_mask+0x4a/0x63 [ 1323.925946] [<ffffffff8021595c>] ? mcheck_check_cpu+0x0/0x3a [ 1323.925946] [<ffffffff8021a57c>] ? smp_call_function+0x19/0x1b [ 1323.925946] [<ffffffff802364af>] ? on_each_cpu+0x18/0x36 [ 1323.925946] [<ffffffff80241031>] ? run_workqueue+0xb1/0x203 [ 1323.925946] [<ffffffff80215360>] ? mcheck_timer+0x21/0x80 [ 1323.925946] [<ffffffff8024107c>] ? run_workqueue+0xfc/0x203 [ 1323.925946] [<ffffffff8021533f>] ? mcheck_timer+0x0/0x80 [ 1323.925946] [<ffffffff80241223>] ? worker_thread+0xa0/0xb1 [ 1323.925946] [<ffffffff80244ac5>] ? autoremove_wake_function+0x0/0x38 [ 1323.925946] [<ffffffff80241183>] ? worker_thread+0x0/0xb1 [ 1323.925946] [<ffffffff802447c1>] ? kthread+0x49/0x76 [ 1323.925946] [<ffffffff8020c108>] ? child_rip+0xa/0x12 [ 1323.925946] [<ffffffff8020b81f>] ? restore_args+0x0/0x30 [ 1323.925946] [<ffffffff80244778>] ? kthread+0x0/0x76 [ 1323.925946] [<ffffffff8020c0fe>] ? child_rip+0x0/0x12 [ 1323.925946] [ 1345.641958] SysRq : Show Locks Held [ 1345.642035] [ 1345.642037] Showing all locks held in the system: [ 1345.642118] 7 locks held by events/3/18: [ 1345.642158] #0: (events ){--..} , at: [<ffffffff80241031>] run_workqueue+0xb1/0x203 [ 1345.642376] #1: ((mcheck_work).work ){--..} , at: [<ffffffff80241031>] run_workqueue+0xb1/0x203 [ 1345.642596] #2: (call_lock ){--..} , at: [<ffffffff8021a53a>] smp_call_function_mask+0x3a/0x63 [ 1345.642833] #3: (&dev->event_lock ){++..} , at: [<ffffffff803e2b90>] input_event+0x3b/0x77 [ 1345.643072] #4: (rcu_read_lock ){..--} , at: [<ffffffff803e1579>] input_pass_event+0x0/0xe1 [ 1345.643292] #5: (sysrq_key_table_lock ){+...} , at: [<ffffffff80368a43>] __handle_sysrq+0x26/0x158 [ 1345.643303] #6: (tasklist_lock ){..--} , at: [<ffffffff8024fc10>] debug_show_all_locks+0x4d/0x17f [ 1345.643303] 1 lock held by agetty/2963: [ 1345.643303] #0: (&tty->atomic_read_lock ){--..} , at: [<ffffffff8035c829>] read_chan+0x2ab/0x760 [ 1345.643303] 1 lock held by agetty/2965: [ 1345.643303] #0: (&tty->atomic_read_lock ){--..} , at: [<ffffffff8035c829>] read_chan+0x2ab/0x760 [ 1345.643303] 1 lock held by agetty/2966: [ 1345.643303] #0: (&tty->atomic_read_lock ){--..} , at: [<ffffffff8035c829>] read_chan+0x2ab/0x760 [ 1345.643303] 1 lock held by agetty/2967: [ 1345.643303] #0: (&tty->atomic_read_lock ){--..} , at: [<ffffffff8035c829>] read_chan+0x2ab/0x760 [ 1345.643303] [ 1345.643303] =============================================
Created attachment 160212 [details] kernel config Debugging has been done with the same config + netconsole + debugging options
http://lkml.org/lkml/2008/3/8/128 Short version: See if nmi_watchdog=1 fixes this. If so, it's probably a BIOS bug and upgrading it will solve your problem.
Although some descriptions look very similar nmi_watchdog=1 did not help in my case (I own Asus M2N32 SLI DELUXE) - system got stuck at some point during glibc compilation as usual..
Created attachment 160217 [details] CPU info
Created attachment 160219 [details] lspci
Is your bios up to date? http://support.asus.com/download/download_item.aspx?product=1&model=M2N32-SLI%20Deluxe
Yes, I'm running the latest BIOS 2001.
I have tried vanilla kernel 2.6.26 with the same result. I have also tried other BIOS revision (first which supports Phenoms) with no luck either. When enabled nmi_watchdog=1 NMI counter does not increase, so I tried nmi_watchdog=2, but that one did not help either although NMI counter was increasing.
For the last a couple of weeks I have been trying different setups (different gcc, kernel, glibc etc.), but none of them worked in the end. The worst part is that those lockups are random - sometimes they happen just a couple of minutes after computer starts.. Although I have already replaced PSU, chasis (better cooling) etc. they still persist. Initially I have compared that with WinXP 32-bit which did not have that. Recently I have installed WinXP x64 which has this problem as well. This would indicate that this a hardware problem - I suspect that it has something to do with USB 2.0 subsystem (USB 1.1 works fine) and possibly broken BIOS. In such case I'll close this bug as problem most probably lies somewhere else.. and sorry for bothering :-)
Hi all. I am using debian/testing, but I have been experiencing the same lockups described here. I have been able to avoid the lockups (so far) by running 2.6.27-rc4 with the notsc boot flag. My current working hypothesis is that the problem is related to AMD erratum 280: ``Time Stamp Counter May Yield an Incorrect Value''. It seems that the time stamp counter on Phenom occasionally (once/day?) returns a bogus value. This problem confuses the softlockup detector, which incorrectly concludes that a task is stuck. It may also confuse the scheduler, but I have not been able to prove this. Kernel 2.6.27 accepts the notsc flag to ignore the time-stamp counter. Note that 2.6.26 and 2.6.25 accept this flag in 32-bit mode and ignore it in 64-bit mode. I haven't looked at earlier kernels. It seems like other people have reported similar lockups but nobody has a solution. If my theory turns out to be correct, the kernel should be patched to autodetect phenoms with the erratum and disable the time-stamp counter automatially.
Thanx for the tip - I'll give it a try and let you know about the outcome. In the meantime I found the following post: http://www.overclock.net/amd-general/319031-phenom-9850-system-freezing-5.html which might explain why I did not have any problems under 32-bit Windows while 64-bit Linux and Windows were freezing.
I tried it and it doesn't seem to help as system was locking up as usual. I saw that there was a new BIOS available on ASUS website, so I downloaded and installed it. The only visible change in the BIOS was a possibility to enable/disable AMD C1E feature (it's disabled by default). After upgrade system does not lockup that often, but it still happens (it locked up during 6th system recompilation - every round took a couple of hours to re-compile ~300 packages). I need more time to investigate it further, but due to random nature of this problem I'm rather skeptic..
It turned out to be a faulty CPU. I got the replacement a couple of days ago and from then I do not experience any problems.