Follow the link to the forum thread where I have documented my problem in full. In short, I have four servers running on Intel 865PERL motherboards and Maxtor 7Y250M0 250GB SATA drives, and two of them are failing with drive errors with some frequency now. They had been running reliably on kernels up to the gentoo-dev-sources-2.6.9 series. I recently updated to 2.6.10-r4 and 2.6.10-r6 kernels. That is when the problems began. I am using the SCSI low-level drivers under libata and Intel PIIX (see my kernel config in the thread). These are production MPEG encoders that are in great demand at the moment, so I can't afford this kind of downtime from stable packages. I may downgrade back to the latest 2.6.9 kernel, but any guidance would be appreciated. Also please take a look at those three photos of the error(s) I was seeing. Thanks. Reproducible: Always Steps to Reproduce: 1. Boot into a gentoo-dev-sources-2.6.10 kernel. 2. Maybe boot normally. Maybe not. 3. At some point, the machine dies with the errors shown in the photos I took (links to them are in the forums thread). 4. Add another piece of hardware or software to my collection of things that do not work. ;-) Actual Results: See step 4. Expected Results: No drive failures.
Could you please see if the issue remains in 2.6.11-rc1?
I may try the new kernel soon. Last night, I got one of the machines back up and running by simply power cycling. After logging in, I stopped samba and nfs services. I wanted to see if I could encode some MPEG video files overnight without the machine crashing. It worked. I came into work this morning to find some MPEG files waiting for me on a machine that hadn't crashed. FWIW, this could be a Samba (very suspicious) or NFS (much less likely) issue. I very rarely use NFS to transfer files between these Gentoo Linux based MPEG encoder machines. Very, very rarely. However, I have a Samba share on each of these so that folks on my LAN (namely me) can view and manipulate the MPEG files. A little background info... We have a program, FlipFactory, that monitors these Samba shares for new MPEG files, and when it finds them, it transcodes the MPEG files to WMV, RM, and MOV streaming media files. Those files are written to a NAS server, so it's just reading the MPEG files from the Samba share. I don't think it is a coincidence that the two machines I have suffered drive failures on are the busiest of the four, with one in particular that encodes MPEG files most nights and gets hit by humans looking at those files by day. The other machine that I am having to rebuild from scratch, I am installing 2.6.10-r6, but I am going to try a ext3 filesystem instead of reiserfs. I've used reiserfs now for 2-3 years with no major incidents, but I'm grasping for anything. Maybe it's a filesystem issue, but I have doubts. If you're interested, here's the /etc/samba/smb.conf from the box that I can get running with a power cycle (any commented out, I removed for brevity): [global] workgroup = webteam netbios name = ketenc server string = MPEG Encoder log file = /var/log/samba3/log.%m max log size = 50 map to guest = bad user security = user encrypt passwords = yes smb passwd file = /etc/samba/private/smbpasswd socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 dns proxy = no [video] path = /mnt/video public = yes only guest = yes writable = yes printable = no Finally, whenever I've built a kernel over the last six months or so (dozens), I've been seeing this when completing my "make menuconfig:" livecd linux # make menuconfig HOSTCC scripts/basic/fixdep HOSTCC scripts/basic/split-include HOSTCC scripts/basic/docproc SHIPPED scripts/kconfig/zconf.tab.h HOSTCC scripts/kconfig/conf.o HOSTCC scripts/kconfig/mconf.o SHIPPED scripts/kconfig/zconf.tab.c SHIPPED scripts/kconfig/lex.zconf.c HOSTCC scripts/kconfig/zconf.tab.o HOSTLD scripts/kconfig/mconf HOSTCC scripts/lxdialog/checklist.o HOSTCC scripts/lxdialog/inputbox.o HOSTCC scripts/lxdialog/lxdialog.o HOSTCC scripts/lxdialog/menubox.o HOSTCC scripts/lxdialog/msgbox.o HOSTCC scripts/lxdialog/textbox.o HOSTCC scripts/lxdialog/util.o HOSTCC scripts/lxdialog/yesno.o HOSTLD scripts/lxdialog/lxdialog scripts/kconfig/mconf arch/i386/Kconfig # # using defaults found in arch/i386/defconfig # arch/i386/defconfig:129: trying to assign nonexistent symbol PM_DISK arch/i386/defconfig:176: trying to assign nonexistent symbol PCI_USE_VECTOR arch/i386/defconfig:252: trying to assign nonexistent symbol BLK_DEV_CARMEL arch/i386/defconfig:273: trying to assign nonexistent symbol IDE_TASKFILE_IO arch/i386/defconfig:292: trying to assign nonexistent symbol BLK_DEV_ADMA *** End of Linux kernel configuration.ign nonexistent symbol SCSI_MEGARAID *** Execute 'make' to build the kernel or try 'make help'.ol NET_FASTROUTE arch/i386/defconfig:571: trying to assign nonexistent symbol NET_HW_FLOWCONTROL livecd linux # nfig:777: trying to assign nonexistent symbol QIC02_TAPE arch/i386/defconfig:1248: trying to assign nonexistent symbol X86_STD_RESOURCES I don't know if that's worth fretting over. Hasn't been caused any problem that I'm aware of. Just throwing out whatever I can that might be suspicious.
Now I very much feel this is related to Samba. I just started the Samba service on the machine that encoded some files last night. The share mounted, and I attempted to copy some file to my laptop across the network. I got a couple files, but then the machine froze. Went to the physical machine itself and attempted to login: it was locked up tight. I'm going to stop Samba services on each of my encoders and see if I can make it through the weekend. If I can, then I'm going to point the finger at Samba. I have had version 3.0.9-r1 installed, and I recently updated to 3.0.10 on most if not all of them. This problem has spanned installation of both versions for me.
It just occurred to me that I didn't supply the link to the forums thread: http://forums.gentoo.org/viewtopic.php?t=282362
Alright, I restarted Samba services on my MPEG encoders after making a change to FlipFactory. I was using regular "Network Folder" monitor as opposed to "Network Folder (Samba)." In short, FF is now using Samba protocols to access my shares instead of regular Windows networking (I think). No drive failures in two days now. This would appear to be a FlipFactory/Samba bug, not something in Gentoo Linux. I should have used the Samba related monitor instead of the vanilla network folder monitor.
Please report this to the relevant upstream developers.