Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 58159

Summary: gentoo-dev-sources 2.6.7 unstable after 15 minutes of heavy file i/o
Product: Gentoo Linux Reporter: Steve Romanow <slestak989>
Component: New packagesAssignee: x86-kernel (DEPRECATED) <x86-kernel>
Severity: critical    
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Package list:
Runtime testing required: ---

Description Steve Romanow 2004-07-24 03:50:35 UTC
This machine is my lan rsync mirror.  3 other machines sync off of this one nightly.  With 2.6.7 kernel, at ~ 3:08am this thing flakes out. 

Reproducible: Sometimes
Steps to Reproduce:
1.boot system on 2.6.7 kernel (any i've tried)
2.sync to, no issue.  have other workstations sync to this machine.
3.start applying random ebuilds (normal upgrades from emerge -pv world)

Actual Results:  
normally, after working for a while after reboot, it dies at 3:08 am
approximately (when sync crons start).  Last night at 11pm i tried to simulate
3am cron activity manually, and system crashed.

I had manually synced 2 machines to problem machine, and had run rdiff-backup
job against another machine (via ssh, one of my nightly cron jobs) and after
15-30 miutes, consoles flaked out reporting Buffer I/O error for /dev/hdb.  That
is the drive that / and /mnt/backup reside on.  When I type reboot, system
returned that /sbin/shutdown was not available.

Expected Results:  
all rsync related jobs to complete successfully and system to remain up for
weeks and days, not hours.

I put part of /var/log/messages in
attachment.  THis was repeated several times, over several nights. In regard to
3am timing, I swapped three rdiff-backup crons around to see if problem moved
and it did not.  I moved jobs out of /etc/cron.daily to see if one of thos
scripts was the cause, but have not determined anything with that yet.  Only odd
thing I can add is I stagger the other workstations emerge sync cron scripts
with a sleep statement so I can use stock /etc/cron.daily dir.  script contains
sleep 900 ; /usr/bin/emerge sync >> ~root/emerge.sync.log

System does not exhibit problem with g-d-s-2.6.5.  Does with all versions of
2.6.7 (couple of r1 through r11, and also mm4).

Half of comments on are this
problem.   Thought it was related to that initially since I use hotplug + usb2
on this machine as well (added within last two months by addon Ali PCI card,
stable with 2.6.5)

Had issue with make oldconfig while troubleshooting this, so thought I had issue
resolved when kernel config straightened out, but issue remains.

Have nvidia mx440 card in system, thought it might be related to that, but I
have appropriate nvidia packages installed.
*  media-video/nvidia-kernel :
        [   ] 1.0.4363-r3 (2.4.26-grsec-2.0)
        [   ] 1.0.4496-r3 (2.4.26-grsec-2.0)
        [M  ] 1.0.4499 (2.4.26-grsec-2.0)
        [ ~ ] 1.0.5328-r1 (2.4.26-grsec-2.0)
        [M  ] 1.0.5332-r1 (2.4.26-grsec-2.0)
        [ ~ ] 1.0.5336-r2 (2.4.26-grsec-2.0)
        [ ~ ] 1.0.5336-r3 (2.4.26-grsec-2.0)
        [ ~I] 1.0.5336-r4 (2.4.26-grsec-2.0)
        [ ~I] 1.0.6106 (2.4.26-grsec-2.0)
lol2 root # etcat -v nvidia-glx
[ Results for search key           : nvidia-glx ]
[ Candidate applications found : 7 ]

 Only printing found installed programs.

*  media-video/nvidia-glx :
        [   ] 1.0.4363-r1 (0)
        [   ] 1.0.4496-r2 (0)
        [M  ] 1.0.4499-r1 (0)
        [ ~ ] 1.0.5328-r2 (0)
        [M  ] 1.0.5332-r2 (0)
        [ ~ ] 1.0.5336-r2 (0)
        [ ~I] 1.0.6106-r3 (0)

I have one or two of my other machines using 2.6.7 with no issues, but this is
the fileserver/backupserver.  It is also the only usb2, non-intel (atlon-xp)
nvidia video card machine i have.

lol2 root # emerge info
Portage 2.0.50-r9 (default-x86-1.4, gcc-3.3.3, glibc-,
System uname: 2.6.5-gentoo-r1 i686 AMD Athlon(TM) XP1800+
Gentoo Base System version 1.4.16
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.3
CFLAGS="-O2 -march=athlon-xp -pipe"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config
/usr/kde/3.1/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config
/usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=athlon-xp -pipe"
FEATURES="autoaddcvs ccache sandbox"
USE="3dnow X aalib alsa apm artswrappersuid avi berkdb bonobo cdr crypt cups
curl dvd encode esd foomaticdb freetype gdbm gif gimpprint gnome gphoto2 gpm
gstreamer gtk gtk2 gtkhtml guile imap imlib jabber java jpeg kde libg++ libwww
live mad mikmod mozilla moznoirc moznomail mpeg mysql ncurses nls nntp nptl
oggvorbis opengl oss pam pda pdflib perl png ppds python qt quicktime readline
scanner sdl slang spell sse ssl svga tcltk tcpd tiff truetype usb x86 xine xml2
xmms xv zlib"
Comment 1 Steve Romanow 2004-07-24 03:54:09 UTC
i have also tried with and without 4k stacks, with both nv and nvidia drivers for xorg. disabled bootsplash for 2.6.7 kernel (still use it for 2.6.5, works ok)
Comment 2 Steve Romanow 2004-07-29 05:48:37 UTC
The Athlon XP this was reported with is no longer running gentoo.  I set it up for my wifes desktop.

However, moved rsync mirror services to my webserver (P2-400, 96M ram) and with 2.6.7-g-d-s-r11, its rsync ability is flaky.  My other workstations (two gentoo machines) cannot sync to it, hangs during initial file transfer (before compare step).  Reboot with 2.6.5-mm4 (mm for sure, I think mm4), and rsyncd works fine. No hangs.

Comment 3 Daniel Drake (RETIRED) gentoo-dev 2004-08-23 00:06:55 UTC
Could you please try with from
Comment 4 Steve Romanow 2004-08-23 05:50:58 UTC
will do.  note that the machine having this problem has been re-dispatched.  My current rsync server is running 2.6.7-r13 and has not exhibited this problem. (At least in a long while.)  Im not sure if I can recreate the problem.
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2004-09-05 06:30:53 UTC
Any luck reproducing the problem? If not, theres nothing much we can do..
Comment 6 Steve Romanow 2004-09-06 17:38:08 UTC
nothing.  i cannot reinstall gentoo on offending ws or get second machine that exhibited problems to do it again.  lets close it and I'll open a new bug and reference this one (or re-open this one) if needed.  

ty for your effort.
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2004-09-07 02:31:22 UTC