Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 220363 - request suggestions for testing expected floating point hardware problem.
Summary: request suggestions for testing expected floating point hardware problem.
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Unspecified (show other bugs)
Hardware: x86 Linux
: High blocker
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-05 13:37 UTC by John (EBo) David
Modified: 2008-05-12 20:40 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John (EBo) David 2008-05-05 13:37:08 UTC
I am requesting pointers/suggestions to stress test the floating
point hardware on Intel dual-core processors.  I have run Memtest86+
several times (the last time overnight) without errors, so no longer
suspect memory problems.  

Intel released a "Specification Update" for the dual-core processors
1/08 which noted the erata "Single Step Interrupts with Floating
Point Exception Pending May Be Mishandled", but is likely unrelated.

Any additional pointers suggestions on how to stress test the
floating point unit appreciated.


Background:

A little over a month ago I purchased a new Acer 5720 laptop and
installed Gentoo.  After getting the system up and running I noticed
that the computational fluid dynamics simulations I had been running
for months without errors started dieing a few seconds into
processing.  In addition, the LiDAR processing software which
cartographicly reprojects hundreds of millions of points on the fly
returns "nan" about once every 50 million points (every point
cartographically reprojected represents thousands of floating point
computations, which equates to a major glitch once every 10e+9
FLOPS).  I have not yet had time to check the full results against
trusted machines to get a better idea how bad the problem is, but it
appears to be getting progressively worst over the last month and the
errors are never in the same place.

When I first configured the machine I followed the Safe CPU
recommendations:

  CHOST="i686-pc-linux-gnu"
  CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"

Thinking that this might be in error I rebuilt the entire system
without "-fomit-frame-pointer" and retested with the same results.  I
then recompiled everything as a generic i686:

  CHOST="i686-pc-linux-gnu"
  CFLAGS="-march=i686 -O2 -pipe"

and retested with the same results.

ACPI is on, fans appear to be running properly, and the temperature
typically runs between 48 to 65C.  At no time do I expect the machine
to have overheated.  I've also tried to build/rebuild several
packages that to stress the CPU (as suggested in one of the forums)

Any pointers suggestions appreciated.


  EBo --

emerge --info to follow

Portage 2.1.4.4 (default-linux/x86/2007.0/desktop, gcc-4.2.3, glibc-2.6.1-r0, 2.6.24-gentoo-r4 i686)
=================================================================
System uname: 2.6.24-gentoo-r4 i686 Intel(R) Pentium(R) Dual CPU T2330 @ 1.60GHz
Timestamp of tree: Mon, 05 May 2008 11:15:01 +0000
app-shells/bash:     3.2_p17-r1
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=i686 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ ftp://distro.ibiblio.org/pub/linux/distributions/gentoo/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/ ftp://ftp.gtlib.gatech.edu/pub/gentoo http://www.gtlib.gatech.edu/pub/gentoo ftp://mirror.iawnet.sandia.gov/pub/gentoo/ ftp://ftp.ussg.iu.edu/pub/linux/gentoo http://cudlug.cudenver.edu/gentoo/ http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://gentoo.mirrors.tds.net/gentoo ftp://gentoo.mirrors.tds.net/gentoo http://gentoo.netnitco.net ftp://gentoo.netnitco.net/pub/mirrors/gentoo/source/ http://mirror.espri.arizona.edu/gentoo/ http://prometheus.cs.wmich.edu/gentoo http://mirror.mcs.anl.gov/pub/gentoo/ ftp://mirror.mcs.anl.gov/pub/gentoo/ http://gentoo.cites.uiuc.edu/pub/gentoo/ ftp://gentoo.cites.uiuc.edu/pub/gentoo/ http://mirror.fslutd.org/linux/distributions/gentoo/ ftp://mirror.fslutd.org/linux/distributions/gentoo/ nehet"
LINGUAS="en"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage_overlays"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa arts audiofile berkdb blender-game bluetooth bzip2 cairo cdr cli cracklib crypt css cups dbus dri dvd dvdr dvdread eds emboss encode esd evo fam ffmpeg firefox fortran ftp gdbm gif gnome gpm gstreamer gtk hal hddtemp hdf5 iconv ieee1394 ipv6 ipw3945 isdnlog java javascript jpeg jpeg2k kde kerberos latex ldap lm_sensors mad midi mikmod mp3 mpeg mudflap ncurses netcdf nls nptl nptlonly ogg openal opengl openmp oss pam pcre pdf perl png pppd python qt3 qt3support qt4 quicktime readline reflection sdl session spell spl ssl svg tcl tcpd threads tiff tk truetype unicode vorbis win32codecs wmf x86 xine xml xorg xv zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse synaptics evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="i810 vesa"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Jan Kundrát (RETIRED) gentoo-dev 2008-05-05 14:05:30 UTC
Hi John, the Bugzilla is not a good support channel for this kind of questions. You should try to seek help at on places mentioned at http://www.gentoo.org/main/en/support.xml , like the Forums, the gentoo-user mailing list or even on the #gentoo irc channel.
Comment 2 John (EBo) David 2008-05-05 14:13:52 UTC
Sorry.  Was not sure where the most appropriate place to post.

Thanks.
Comment 3 nobody 2008-05-05 15:42:52 UTC
if i were you i would have build a kernel with fpu emulation support in order to let my cpu take the calcs instead of my fpu.
So i would have "nearly" (cause of kernel changes) the same test bed.

This isn't what your were asking at firt, stress the fpu, but it will at least answer it if the errors still occurs when your cpu do the job.
My 2 cents.
Comment 4 John (EBo) David 2008-05-06 17:07:30 UTC
Thank you Stéphane for the suggestion -- I was so focused on the details I missed the obvious.  

I'm recompiling now and will post the results in a few days once I get a chance to run a bunch of tests.

  Thanks!
Comment 5 John (EBo) David 2008-05-12 20:40:48 UTC
Just to follow up on Stéphane's suggestion...

Turning math emulation on did not help, nor did changing the CPU type (the autosenced CPU type was a core2duo while the actual CPU is a dual-core T2330).  Rebuilding the entire system to follow configuration changes were of no help.

I was unbelievably lucky to actually track this down to being caused by the app-laptop/acerhk kernel module (I've posted a bug report #221867).

Thanks again for everyones suggestions.

  EBo --