I've just upgraided raidtools to 1.00.3-r1 and thought of trying lsraid, like the first example in the manual: lsraid -A -a /dev/md0 but is segfaults: GNU gdb 5.3 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...(no debugging symbols found)... Core was generated by `lsraid -A -a /dev/md0'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x0804a408 in strcpy () (gdb) bt #0 0x0804a408 in strcpy () contents of /etc/raidtab: raiddev /dev/md0 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 persistent-superblock 1 device /dev/hdb1 raid-disk 0 device /dev/hdd1 raid-disk 1 Unfortunately, this new bugreport form doesn't contain the combo in which I could select what version of gentoo I use, which, btw, is 1.2 (gcc version 2.95.3 20010315 (release)). Reproducible: Always Steps to Reproduce: see above.
please post `emerge info` ... it works on my Gentoo-1.2 box w/gcc 2.95.3 ... root@rux0r root # lsraid -A -a /dev/md0 [dev 9, 0] /dev/md0 CD6CEC77.CD786619.C44EB873.6AD84197 online [dev 33, 0] /dev/hde CD6CEC77.CD786619.C44EB873.6AD84197 good [dev 33, 64] /dev/hdf CD6CEC77.CD786619.C44EB873.6AD84197 good [dev 34, 0] /dev/hdg CD6CEC77.CD786619.C44EB873.6AD84197 good [dev 34, 64] /dev/hdh CD6CEC77.CD786619.C44EB873.6AD84197 good root@rux0r root # lsraid -A -a /dev/md1 [dev 9, 1] /dev/md1 16A7EC46.9107A01B.873192AD.3026C379 online [dev 56, 0] /dev/hdi 16A7EC46.9107A01B.873192AD.3026C379 good [dev 56, 64] /dev/hdj 16A7EC46.9107A01B.873192AD.3026C379 good [dev 57, 0] /dev/hdk 16A7EC46.9107A01B.873192AD.3026C379 good [dev 57, 64] /dev/hdl 16A7EC46.9107A01B.873192AD.3026C379 good
Portage 2.0.46-r12 (default-1.0, gcc-2.95.3, glibc-2.2.5-r2,2.2.5-r7) ================================================================= System uname: 2.4.20-pre6 i686 Pentium III (Coppermine) GENTOO_MIRRORS="ftp://ftp.dale.ro/pub/mirrors/ftp.ibiblio.org/pub/Linux/distributions/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo ftp://ftp.rez-gif.supelec.fr/pub/Linux/distrib/gentoo http://ftp.snt.utwente.nl/pub/os/linux/gentoo ftp://ftp.snt.utwente.nl/pub/os/linux/gentoo http://distro.ibiblio.org/pub/linux/distributions/gentoo/distfiles " CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" PORTDIR="/usr/portage" DISTDIR="/usr/portage/distfiles" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR_OVERLAY="" USE="x86 arts avi crypt jpeg libg++ mikmod mmx mpeg ncurses pdflib qtmt quicktime spell truetype xml2 xmms xv bonobo gif gnome-libs gpm gtk guile imlib java libwww motif nls oggvorbis opengl pam png python qt readline sdl slang ssl svga tcltk tcpd X maildir -apache2 -3dnow -apm -cups ldap xml curl berkdb sse dga -gnome -kde evo gtkhtml aalib lcms tiff gd esd oss flash freetype encode imap mozilla mozctl mozxmlterm mznoirc tetex perl mysql postgres odbc innodb gdbm samba" COMPILER="" CHOST="i686-pc-linux-gnu" CFLAGS="-march=i686 -O3 -pipe -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4" CXXFLAGS="-march=i686 -O3 -pipe -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4" ACCEPT_KEYWORDS="x86 ~x86" MAKEOPTS="-j2" AUTOCLEAN="yes" SYNC="rsync://rsync.gentoo.org/gentoo-portage" FEATURES="sandbox ccache" I've tried recompiling it with CFLAGS="-march=i686 -O3 -pipe", but it still segfaults...
only thing i can think of is maybe the ccache is thwarting your attempts to recompile with CFLAGS="" ... try this: mv ~/.ccache ~/.ccache-old env CFLAGS="" emerge raidtools mv ~/.ccache-old ~/.ccache and see how it goes
also what kernel version ? what version are you running and what version of linux-headers do you have installed (or did you modify /usr/include/{linux,asm}) ?
Hmm... I don't use ccache: * dev-util/ccache Latest version available: 2.1.1 Latest version installed: [ Not Installed ] and I don't have a ~/.ccache directory because of that...
* sys-kernel/linux-headers Latest version available: 2.4.18 Latest version installed: 2.4.18 uname -a Linux webdev.ines.ro 2.4.20-pre6 #4 Sat Oct 26 16:38:33 EEST 2002 i686 Pentium III (Coppermine) GenuineIntel GNU/Linux and /usr/src/linux links to the sources the running kernel was built from.
so what version is your kernel ?
The exactly previous message says that... 2.4.20-pre6
whoops completely missed that ;) i run a 2.5.x kern so ill try to setup my system to reflect yours sometime in the future
I'd better update my kernel... the problem is that it's a production server (my team's projects are stored there), and my primary job is web development, and I really don't have time to run some filesystem tests on a new kernel to make sure it doesn't ruin all the projects... and I don't really need lsraid, I was just playing with it. Thanks for all your efforts.
i dont expect you to update your kernel, and if it's production, then don't ... my boxes are dev boxes so i dont mind messing around with different kernels ... and like you said, lsraid isnt a required tool ... it's still something to be looked at since 'segfault' is not the correct behavior ;)
ha ha ha... I've reinstalled gentoo, and put 1.4 and kernel 2.4.20-gentoo-r1 and it still segfaults. Here are my new os specs: Portage 2.0.47-r8 (default-x86-1.4, gcc-3.2.2, glibc-2.3.2-r0) ================================================================= System uname: 2.4.20-gentoo-r1 i686 Pentium III (Coppermine) GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://distro.ibiblio.org/pub/linux/distributions/gentoo/distfiles " CONFIG_PROTECT="/etc /var/qmail/control /usr/share/config /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" PORTDIR="/usr/portage" DISTDIR="/usr/portage/distfiles" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/mnt/hdc/tmp" PORTDIR_OVERLAY="" USE="x86 oss avi crypt cups encode gif jpeg libg++ mmx mpeg ncurses oggvorbis pdflib png quicktime sdl spell truetype xml2 xmms xv zlib gdbm berkdb slang readline tcltk java mysql postgres X gpm tcpd pam libwww ssl perl python imlib gtk qt motif opengl -3dnow aalib acpi alsa -apm -arts curl dga gd -gnome gtk2 imap innodb -kde lcms ldap maildir -mikmod mozilla -nls pic samba sasl -svga sse xml" COMPILER="gcc3" CHOST="i686-pc-linux-gnu" CFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointer -fprefetch-loop-arrays -ffast-math -fforce-addr -falign-functions=4 -mfpmath=sse" CXXFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointer -fprefetch-loop-arrays -ffast-math -fforce-addr -falign-functions=4 -mfpmath=sse" ACCEPT_KEYWORDS="x86 ~x86" MAKEOPTS="-j2" AUTOCLEAN="yes" SYNC="rsync://rsync.gentoo.org/gentoo-portage" FEATURES="sandbox ccache" I've recompiled raidtools with CFLAGS="-march=pentium3 -O3 -pipe" and even with CFLAGS="", but lsraid still segfaults.
Is this bug still an issue?
Yes... it still segfaults.
Can you upgrade to the latest gentoo-sources and add a stack trace [emerge strace] if this still occurs?
It works now... on latest gentoo-sources... and after a system reinstall... ;)
Resolving. Thanks for keeping us informed.
There is an actual bug in lsraid in this package. On line 1048 in lsraid.c, the fgets reads the lines from /proc/partitions. If a line of /proc/partitions is longer than MAX_LINE_LENGTH the program will segfault. Currently MAX_LINE_LENGTH is set to 100. When I reset it to 1000 (in common.h), the program works as described. This should be mentioned to the developpers, I guess. This bug needs to be reopened.
Where abouts in the code does the segfault come from? My /proc/partitions is relatively short, but setting MAX_LINE_LENGTH to a small value did not cause a segfault. Perhaps you could post (as a text attachment) your /proc/partitions ?
There are several long lines in /proc/partitions on my machine.. I clearly traced it down, and saw the problem as I outputed the lines as they were read in that loop. Just for kicks, here's the /proc/partitions for the machine in question. I have several lines more than the 100 MAX_LINES I described ... raising the MAX_LINES to a large number fixed the seg fault for me, I'd have to see your /proc/partitions and know the number you set for MAX_LINES to know why your's didn't segfault. major minor #blocks name rio rmerge rsect ruse wio wmerge wsect wuse running use aveq 9 1 156250880 md/1 0 0 0 0 0 0 0 0 0 0 0 9 2 39136256 md/2 0 0 0 0 0 0 0 0 0 0 0 34 0 78150744 ide/host2/bus1/target0/lun0/disc 126343 106567 1863800 1125680 430212 2759677 25650032 6611050 -1 24378370 26070552 34 1 78125512 ide/host2/bus1/target0/lun0/part1 126338 106531 1863712 1125640 430211 2759677 25650024 6611050 0 1032130 7759360 34 2 25200 ide/host2/bus1/target0/lun0/part2 4 33 80 30 1 0 8 0 0 30 30 34 64 78150744 ide/host2/bus1/target1/lun0/disc 125734 107832 1869122 1212310 474905 2761087 26018448 6534090 -1 24397600 26079452 34 65 78125512 ide/host2/bus1/target1/lun0/part1 125729 107796 1869034 1212240 474904 2761087 26018440 6534090 0 1017420 7768220 34 66 25200 ide/host2/bus1/target1/lun0/part2 4 33 80 60 1 0 8 0 0 60 60 33 0 117220824 ide/host2/bus0/target0/lun0/disc 125623 106672 1859558 918010 450379 2757368 25785456 3054690 -3 24584880 15962915 33 1 78132096 ide/host2/bus0/target0/lun0/part1 124926 106636 1853946 580110 444469 2753138 25704336 3052210 0 617020 3638810 33 2 39086145 ide/host2/bus0/target0/lun0/part2 696 33 5604 337900 5910 4230 81120 2480 0 4400 340380 3 0 60051600 ide/host0/bus0/target0/lun0/disc 1680269 149178 14634960 8015170 455353 326497 6257880 10981600 -2 24414200 12669562 3 1 1889968 ide/host0/bus0/target0/lun0/part1 10 18 98 70 2 0 16 0 0 70 70 3 2 5670000 ide/host0/bus0/target0/lun0/part2 1248 5063 50488 10560 1543 10464 98304 26820 0 15610 37640 3 3 52489080 ide/host0/bus0/target0/lun0/part3 1679010 144094 14584366 8004540 453808 316033 6159560 10954780 0 6823020 18959320 3 64 43965432 ide/host0/bus0/target1/lun0/disc 1 3 8 20 0 0 0 0 -1 24638540 18311132 3 65 26627706 ide/host0/bus0/target1/lun0/part1 0 0 0 0 0 0 0 0 0 0 0
You asked where in the code the problem was, if you search lsraid.c for the line "/*FIXME: I'm lazy, can this be overrun? */" you can see the problem. In answer to the lazy coder, I'd have to say yes. Raising MAX_LINES is a temporary solution. This bit should really be fixed. At least /proc/partitions isn't a user writable file ... if it was this would be a proper security concern as this is a classic c no-no.
So it is the scanf call that causes the segfault? Also, could you please post the /proc/partitions file as a text attachment to avoid any formatting being lost, if this matters. I know that what you have posted already should be enough for me to look into the issue just if you have already done some of the work then there's not much point me doing it too :) Also my system is non-RAID so I'm sort of working blind. I'm out now but will look into this another time.
Created attachment 37546 [details] /proc/partitions Yeah, the scanf is the source of the segfault. What happens is that for the overlength lines, only the first MAX_LINE_LENGTH is read and scanf does its thing. Then on the next loop, the rest of the line is read in and scanf is applied to it and it's obviously not formatted correctly and it dies then. The best thing to do I think is to check the result from fgets to be sure that all of the line was read in, and obviously do some sort of sanity check on the line to be sure scanf won't barf. A functional thing would be to increase MAX_LINE_LENGTH and refer the matter to the author. Here's my /proc/partitions for you to play with.
Created attachment 37610 [details, diff] longline-fix.patch Ok, please apply this patch and test out lsraid. I could successfully get a segfault with your partitions file, and I think I have fixed it. Thanks.
Created attachment 37668 [details, diff] Fixes a few issues with previous patch. I looked at your patch, and it's quite good. There are just two issues that I decided to clear up. 1. I don't know if sizeof(char) is always one on every system, but just in case I always include it to ensure portability. I modified your mallocs. 2. It's possible that a line can be an exact multiple of MAX_LINE_LENGTH. In this exceedingly rare case your code will read in the next line as though it's part of the current line. Since fgets stops on '\n', we can test to see if it finished or not by checking to see if the last character is a \n, as I've included. Check me out here ... am I testing the correct position? I've run some simple tests, but I'd like a second opinion. Other than these issues, this is a great patch.
That low-probability situation also occured to me earlier on (line length = MAX_LINE_LENGTH).. I'm not sure that your fix will solve the issue, I'll have a play and run some tests to satisfy my curiousity. Either way, with our work applied, does lsraid work as expected?
It does now work as expected. Until the next bug ;)
I decided against the dynamic buffer approach.. it doesnt seem right. Had a look through some other packages, they seem to just use a static buffer of sensible size. I've just released 1.00.3-r2 into ~, could you please test? I simply added a sed hack to the ebuild to set MAX_LINE_LENGTH to 1000.
Bump..Adam, could you please test this to confirm?
Sorry for the delay in testing. Just updated, and seems to work fine.
Thanks. -r2 is now marked stable in cvs. Will also report this issue to the raidtools author.