Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 445104 - x11-drivers/nvidia-drivers-310.19 - Occasionally causes crash on boot.
Summary: x11-drivers/nvidia-drivers-310.19 - Occasionally causes crash on boot.
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Unspecified (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Doug Goldstein (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-28 20:36 UTC by Pavel Volkov
Modified: 2013-01-29 13:59 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
screenshot after crash (nv.jpg,781.52 KB, image/jpeg)
2012-11-28 20:36 UTC, Pavel Volkov
Details
s1.jpg (s1.jpg,808.37 KB, image/jpeg)
2012-11-29 14:48 UTC, Pavel Volkov
Details
s2.jpg (s2.jpg,904.19 KB, image/jpeg)
2012-11-29 14:49 UTC, Pavel Volkov
Details
s3.jpg (s3.jpg,725.69 KB, image/jpeg)
2012-11-29 14:50 UTC, Pavel Volkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Volkov 2012-11-28 20:36:07 UTC
I experience crash on startup with nvidia-drivers-310.19 (roughly 1 time in 5-6 boots). This doesn't happen with 304.64 - I made 25 test-reboots and it was ok.
If it's relevant, I use systemd for init.

Reproducible: Sometimes

Steps to Reproduce:
Reboot with nvidia-drivers-310.19 installed.
Actual Results:  
I experience one of these results:

a. I see the message about GPU0 and then the systems hangs. It may be followed by message about [sdb] being found.
b. Some messages appear, then screen goes blank and the system hangs
c. Sometimes I can hear PC speaker making a series of long beeps when the screen goes blank.
d. One time I saw a backtrace (see attachment).

Expected Results:  
Successful boot

Portage 2.2.0_alpha142 (default/linux/amd64/10.0/desktop/kde, gcc-4.6.3, glibc-2.16.0, 3.6.6-gentoomelf x86_64)
=================================================================
System uname: Linux-3.6.6-gentoomelf-x86_64-Intel-R-_Core-TM-_i5-2400_CPU_@_3.10GHz-with-gentoo-2.2
Timestamp of tree: Wed, 28 Nov 2012 04:45:01 +0000
ld GNU ld (GNU Binutils) 2.23.1
app-shells/bash:          4.2_p39
dev-lang/python:          2.7.3-r2, 3.2.3-r1
dev-util/cmake:           2.8.10.1
dev-util/pkgconfig:       0.27.1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.5
sys-apps/sandbox:         2.6
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.9.6-r3, 1.11.6, 1.12.5
sys-devel/binutils:       2.23.1
sys-devel/gcc:            4.6.3
sys-devel/gcc-config:     1.8
sys-devel/libtool:        2.4.2
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.6 (virtual/os-headers)
sys-libs/glibc:           2.16.0
Repositories: gentoo custom
Installed sets: @fonts, @kde, @vim
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA PUEL skype-4.0.0.7-copyright AdobeFlash-10.3"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native -mtune=native"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/polkit-1/actions"
CONFIG_PROTECT_MASK="${EPREFIX}/etc/gconf /etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native -mtune=native"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://mirror.yandex.ru/gentoo-distfiles/ http://trumpetti.atm.tut.fi/gentoo/"
LANG="ru_RU.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en en_GB ru ja"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--ipv4"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://mirror.yandex.ru/gentoo-portage/"
USE="X a52 aac acl acpi alsa amd64 anthy bash-completion bluetooth branding bzip2 cairo cdda cdr cjk cli consolekit cracklib crypt cups cxx dbus declarative directfb djvu dri dts dvd dvdr embedded emboss encode exif fam fbcon ffmpeg firefox flac fortran gdbm gif gpm gstreamer gtk iconv icu idn ipv6 jpeg kde kipi lame lcms libcaca libnotify lm_sensors mad matroska mmx mng modules mp3 mp4 mpeg mudflap multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf perl phonon plasma png policykit ppds pppd python qt3support qt4 raw readline samba sdl semantic-desktop session spell sse sse2 ssl startup-notification svg systemd tcpd tiff truetype udev udisks unicode upower usb vdpau vorbis wxwidgets x264 xcb xcomposite xinerama xml xscreensaver xv xvid zlib zsh-completion" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en en_GB ru ja" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" SANE_BACKENDS="genesys" USERLAND="GNU" VIDEO_CARDS="nvidia nouveau" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, USE_PYTHON
Comment 1 Pavel Volkov 2012-11-28 20:36:51 UTC
Created attachment 330848 [details]
screenshot after crash
Comment 2 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2012-11-29 00:04:11 UTC
> I see the message about GPU0 and then the systems hangs.

Which message do you see? If it scrolls too fast, you can do:

Compile your kernel with CONFIG_BOOT_PRINTK_DELAY=y.

Then, boot your system with the extra kernel parameter boot_delay=N where you set N to a value that is convenient for being able to read / capture the error. The N in boot_delay=N is in milliseconds, so you probably want to use 1000 (one second) and adapt from there when that's too slow / fast.

Interactively editing the kernel line in grub can be handy so you don't have to boot into your working kernel every time to adapt the value to what you like.

If you can, take a picture or capture a video...

> It may be followed by message about [sdb] being found.

What's interesting is that in the screenshot, it is doing a File System Check at the top. Since it only crashes sometimes, perhaps this is a issue where this runs in parallel.

> Sometimes I can hear PC speaker making a series of long beeps when the screen goes blank.

This is the kernel panic, usually this kind of beeping is for servers.

> One time I saw a backtrace (see attachment).

Interesting is that this backtrace is for nvidia-smi, you could try to disable that from starting in one or another way (making the file absent, remove it from systemd's files, ...). I did this myself as part of optimizing my boot and don't seem to need it running, YMMV so you might want to start it at some later point.
Comment 3 Pavel Volkov 2012-11-29 14:48:20 UTC
Today's screenshots are similar to the first one.
s1.jpg was followed by s2.jpg after a few seconds.
And s3.jpg is similar to s1.jpg, too.

I'll try removing /opt/bin/nvidia-smi later.
Comment 4 Pavel Volkov 2012-11-29 14:48:47 UTC
Created attachment 330942 [details]
s1.jpg
Comment 5 Pavel Volkov 2012-11-29 14:49:48 UTC
Created attachment 330944 [details]
s2.jpg
Comment 6 Pavel Volkov 2012-11-29 14:50:55 UTC
Created attachment 330946 [details]
s3.jpg
Comment 7 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2012-11-29 16:42:12 UTC
That "timeout: killing 'nvidia-udev.sh add'" on s1.jpg is where it goes wrong.

Since it is a timeout, it is waiting for something there, but what?!

As that timeout kills nvidia-smi, it probably causes that crash if it kills nvidia-smi parallel during something else important that is happening.
Comment 8 Andrey Ovcharov 2012-12-01 16:48:50 UTC
From nvidia-drivers-310.19.ebuild 

>local udevdir=$(udev_get_udevdir) 

># Ensures that our device nodes are created when not using X 
>exeinto "${udevdir}" 
>doexe "${FILESDIR}"/nvidia-udev.sh 

>insinto "${udevdir}"/rules.d 
>newins "${FILESDIR}"/nvidia.udev-rule 99-nvidia.rules 

If all this stuff --> nvidia-udev.sh, 99-nvidia.rules need only when USE="-X" maybe it would be better some:

>if ! use X; then # Ensures that our device nodes are created when not using X
>    local udevdir=$(udev_get_udevdir) 
 
>    exeinto "${udevdir}" 
>    doexe "${FILESDIR}"/nvidia-udev.sh 

>    insinto "${udevdir}"/rules.d 
>    newins "${FILESDIR}"/nvidia.udev-rule 99-nvidia.rules 
>fi;

???
Comment 9 Pavel Volkov 2012-12-02 16:46:53 UTC
Some update.
I switched to UEFI boot while using 304.64 drivers and it caused nvidia-smi crash every boot, but it didn't stop boot process and there was no kernel panic. It's probably because of EFI's framebuffer.
Then I updated to 310.19 again and renamed nvidia-smi binary. I detect no problems so far.
Comment 10 Pavel Volkov 2013-01-29 13:59:07 UTC
Can't reproduce the crash anymore. Closing.