15835 – lsraid segfaults

Bug 15835 - lsraid segfaults

Summary: lsraid segfaults

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	High normal
Assignee:	Daniel Drake (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2003-02-17 05:08 UTC by Andrei Ivanov
Modified:	2004-09-06 08:57 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
/proc/partitions (partitions,1.89 KB, text/plain) 2004-08-16 11:19 UTC, Adam Hixson	Details
longline-fix.patch (longline-fix.patch,1.79 KB, patch) 2004-08-17 09:43 UTC, Daniel Drake (RETIRED)	Details \| Diff
Fixes a few issues with previous patch. (longline-fix-2.patch,1.39 KB, patch) 2004-08-18 09:29 UTC, Adam Hixson	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrei Ivanov 2003-02-17 05:08:39 UTC

I've just upgraided raidtools to 1.00.3-r1 and thought of trying lsraid, like 
the first example in the manual:
lsraid -A -a /dev/md0

but is segfaults:

GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...(no debugging symbols found)...
Core was generated by `lsraid -A -a /dev/md0'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x0804a408 in strcpy ()
(gdb) bt
#0  0x0804a408 in strcpy ()

contents of /etc/raidtab:
raiddev /dev/md0
        raid-level            1
        nr-raid-disks         2
        nr-spare-disks        0
        chunk-size            4
        persistent-superblock 1
        device                /dev/hdb1
        raid-disk             0
        device                /dev/hdd1
        raid-disk             1

Unfortunately, this new bugreport form doesn't contain the combo in which I 
could select what version of gentoo I use, which, btw, is 1.2 (gcc version 
2.95.3 20010315 (release)).

Reproducible: Always
Steps to Reproduce:
see above.

Comment 1 SpanKY gentoo-dev

2003-02-17 13:04:28 UTC

please post `emerge info` ... it works on my Gentoo-1.2 box w/gcc 2.95.3 ...

root@rux0r root # lsraid -A -a /dev/md0
[dev   9,   0] /dev/md0         CD6CEC77.CD786619.C44EB873.6AD84197 online
[dev  33,   0] /dev/hde         CD6CEC77.CD786619.C44EB873.6AD84197 good
[dev  33,  64] /dev/hdf         CD6CEC77.CD786619.C44EB873.6AD84197 good
[dev  34,   0] /dev/hdg         CD6CEC77.CD786619.C44EB873.6AD84197 good
[dev  34,  64] /dev/hdh         CD6CEC77.CD786619.C44EB873.6AD84197 good

root@rux0r root # lsraid -A -a /dev/md1
[dev   9,   1] /dev/md1         16A7EC46.9107A01B.873192AD.3026C379 online
[dev  56,   0] /dev/hdi         16A7EC46.9107A01B.873192AD.3026C379 good
[dev  56,  64] /dev/hdj         16A7EC46.9107A01B.873192AD.3026C379 good
[dev  57,   0] /dev/hdk         16A7EC46.9107A01B.873192AD.3026C379 good
[dev  57,  64] /dev/hdl         16A7EC46.9107A01B.873192AD.3026C379 good

Comment 2 Andrei Ivanov 2003-02-17 15:09:37 UTC

Portage 2.0.46-r12 (default-1.0, gcc-2.95.3, glibc-2.2.5-r2,2.2.5-r7)
=================================================================
System uname: 2.4.20-pre6 i686 Pentium III (Coppermine)
GENTOO_MIRRORS="ftp://ftp.dale.ro/pub/mirrors/ftp.ibiblio.org/pub/Linux/distributions/gentoo ftp://ftp.tu-clausthal.de/pub/linux/gentoo ftp://ftp.rez-gif.supelec.fr/pub/Linux/distrib/gentoo http://ftp.snt.utwente.nl/pub/os/linux/gentoo ftp://ftp.snt.utwente.nl/pub/os/linux/gentoo http://distro.ibiblio.org/pub/linux/distributions/gentoo/distfiles "
CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR_OVERLAY=""
USE="x86 arts avi crypt jpeg libg++ mikmod mmx mpeg ncurses pdflib qtmt quicktime spell truetype xml2 xmms xv bonobo gif gnome-libs gpm gtk guile imlib java libwww motif nls oggvorbis opengl pam png python qt readline sdl slang ssl svga tcltk tcpd X maildir -apache2 -3dnow -apm -cups ldap xml curl berkdb sse dga -gnome -kde evo gtkhtml aalib lcms tiff gd esd oss flash freetype encode imap mozilla mozctl mozxmlterm mznoirc tetex perl mysql postgres odbc innodb gdbm samba"
COMPILER=""
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O3 -pipe -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4"
CXXFLAGS="-march=i686 -O3 -pipe -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4"
ACCEPT_KEYWORDS="x86 ~x86"
MAKEOPTS="-j2"
AUTOCLEAN="yes"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
FEATURES="sandbox ccache"


I've tried recompiling it with CFLAGS="-march=i686 -O3 -pipe", but it still segfaults...

Comment 3 SpanKY gentoo-dev

2003-02-17 20:01:56 UTC

only thing i can think of is maybe the ccache is thwarting your attempts to recompile with CFLAGS="" ...

try this:
mv ~/.ccache ~/.ccache-old
env CFLAGS="" emerge raidtools
mv ~/.ccache-old ~/.ccache

and see how it goes

Comment 4 SpanKY gentoo-dev

2003-02-18 01:08:07 UTC

also what kernel version ? 
what version are you running and what version of linux-headers do you have installed 
(or did you modify /usr/include/{linux,asm}) ?

Comment 5 Andrei Ivanov 2003-02-18 04:08:47 UTC

Hmm... I don't use ccache:

*  dev-util/ccache
      Latest version available: 2.1.1
      Latest version installed: [ Not Installed ]

and I don't have a ~/.ccache directory because of that...

Comment 6 Andrei Ivanov 2003-02-18 04:56:12 UTC

*  sys-kernel/linux-headers
      Latest version available: 2.4.18
      Latest version installed: 2.4.18

uname -a
Linux webdev.ines.ro 2.4.20-pre6 #4 Sat Oct 26 16:38:33 EEST 2002 i686 Pentium III (Coppermine) GenuineIntel GNU/Linux

and /usr/src/linux links to the sources the running kernel was built from.

Comment 7 SpanKY gentoo-dev

2003-02-23 22:56:14 UTC

so what version is your kernel ?

Comment 8 Andrei Ivanov 2003-02-24 05:12:14 UTC

The exactly previous message says that... 2.4.20-pre6

Comment 9 SpanKY gentoo-dev

2003-02-24 12:55:55 UTC

whoops completely missed that ;) 
i run a 2.5.x kern so ill try to setup my system to reflect yours sometime in the future

Comment 10 Andrei Ivanov 2003-02-24 14:45:54 UTC

I'd better update my kernel... the problem is that it's a production server (my team's projects are stored there), and my primary job is web development, and I really don't have time to run some filesystem tests on a new kernel to make sure it doesn't ruin all the projects... and I don't really need lsraid, I was just playing with it. Thanks for all your efforts.

Comment 11 SpanKY gentoo-dev

2003-02-24 15:37:01 UTC

i dont expect you to update your kernel, and if it's production, then don't ... 
my boxes are dev boxes so i dont mind messing around with different kernels ... 
 
and like you said, lsraid isnt a required tool ... it's still something to be looked at since 
'segfault' is not the correct behavior ;)

Comment 12 Andrei Ivanov 2003-03-10 06:10:27 UTC

ha ha ha... I've reinstalled gentoo, and put 1.4 and kernel 2.4.20-gentoo-r1 and it still segfaults. Here are my new os specs:

Portage 2.0.47-r8 (default-x86-1.4, gcc-3.2.2, glibc-2.3.2-r0)
=================================================================
System uname: 2.4.20-gentoo-r1 i686 Pentium III (Coppermine)
GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://distro.ibiblio.org/pub/linux/distributions/gentoo/distfiles "
CONFIG_PROTECT="/etc /var/qmail/control /usr/share/config /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/mnt/hdc/tmp"
PORTDIR_OVERLAY=""
USE="x86 oss avi crypt cups encode gif jpeg libg++ mmx mpeg ncurses oggvorbis pdflib png quicktime sdl spell truetype xml2 xmms xv zlib gdbm berkdb slang readline tcltk java mysql postgres X gpm tcpd pam libwww ssl perl python imlib gtk qt motif opengl -3dnow aalib acpi alsa -apm -arts curl dga gd -gnome gtk2 imap innodb -kde lcms ldap maildir -mikmod mozilla -nls pic samba sasl -svga sse xml"
COMPILER="gcc3"
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointer -fprefetch-loop-arrays -ffast-math -fforce-addr -falign-functions=4 -mfpmath=sse"
CXXFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointer -fprefetch-loop-arrays -ffast-math -fforce-addr -falign-functions=4 -mfpmath=sse"
ACCEPT_KEYWORDS="x86 ~x86"
MAKEOPTS="-j2"
AUTOCLEAN="yes"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
FEATURES="sandbox ccache"

I've recompiled raidtools with CFLAGS="-march=pentium3 -O3 -pipe" and even with CFLAGS="", but lsraid still segfaults.

Comment 13 Tim Yamin (RETIRED) gentoo-dev

2003-09-11 15:03:58 UTC

Is this bug still an issue?

Comment 14 Andrei Ivanov 2003-09-13 02:45:48 UTC

Yes... it still segfaults.

Comment 15 Tim Yamin (RETIRED) gentoo-dev

2003-09-13 05:39:31 UTC

Can you upgrade to the latest gentoo-sources and add a stack trace [emerge strace] if this still occurs?

Comment 16 Andrei Ivanov 2003-09-20 08:25:00 UTC

It works now... on latest gentoo-sources... and after a system reinstall... ;)

Comment 17 Tim Yamin (RETIRED) gentoo-dev

2003-09-20 16:41:27 UTC

Resolving. Thanks for keeping us informed.

Comment 18 Adam Hixson 2004-08-01 14:53:27 UTC

There is an actual bug in lsraid in this package.  On line 1048 in lsraid.c, the fgets reads the lines from /proc/partitions.  If a line of /proc/partitions is longer than MAX_LINE_LENGTH the program will segfault.  Currently MAX_LINE_LENGTH is set to 100.  When I reset it to 1000 (in common.h), the program works as described.  This should be mentioned to the developpers, I guess.

This bug needs to be reopened.

Comment 19 Daniel Drake (RETIRED) gentoo-dev

2004-08-16 08:04:47 UTC

Where abouts in the code does the segfault come from? My /proc/partitions is relatively short, but setting MAX_LINE_LENGTH to a small value did not cause a segfault. Perhaps you could post (as a text attachment) your /proc/partitions ?

Comment 20 Adam Hixson 2004-08-16 08:50:56 UTC

There are several long lines in /proc/partitions on my machine..  I clearly traced it down, and saw the problem as I outputed the lines as they were read in that loop.

Just for kicks, here's the /proc/partitions for the machine in question.  I have several lines more than the 100 MAX_LINES I described ... raising the MAX_LINES to a large number fixed the seg fault for me, I'd have to see your /proc/partitions and know the number you set for MAX_LINES to know why your's didn't segfault.



major minor  #blocks  name     rio rmerge rsect ruse wio wmerge wsect wuse running use aveq

   9     1  156250880 md/1 0 0 0 0 0 0 0 0 0 0 0
   9     2   39136256 md/2 0 0 0 0 0 0 0 0 0 0 0
  34     0   78150744 ide/host2/bus1/target0/lun0/disc 126343 106567 1863800 1125680 430212 2759677 25650032 6611050 -1 24378370 26070552
  34     1   78125512 ide/host2/bus1/target0/lun0/part1 126338 106531 1863712 1125640 430211 2759677 25650024 6611050 0 1032130 7759360
  34     2      25200 ide/host2/bus1/target0/lun0/part2 4 33 80 30 1 0 8 0 0 30 30
  34    64   78150744 ide/host2/bus1/target1/lun0/disc 125734 107832 1869122 1212310 474905 2761087 26018448 6534090 -1 24397600 26079452
  34    65   78125512 ide/host2/bus1/target1/lun0/part1 125729 107796 1869034 1212240 474904 2761087 26018440 6534090 0 1017420 7768220
  34    66      25200 ide/host2/bus1/target1/lun0/part2 4 33 80 60 1 0 8 0 0 60 60
  33     0  117220824 ide/host2/bus0/target0/lun0/disc 125623 106672 1859558 918010 450379 2757368 25785456 3054690 -3 24584880 15962915
  33     1   78132096 ide/host2/bus0/target0/lun0/part1 124926 106636 1853946 580110 444469 2753138 25704336 3052210 0 617020 3638810
  33     2   39086145 ide/host2/bus0/target0/lun0/part2 696 33 5604 337900 5910 4230 81120 2480 0 4400 340380
   3     0   60051600 ide/host0/bus0/target0/lun0/disc 1680269 149178 14634960 8015170 455353 326497 6257880 10981600 -2 24414200 12669562
   3     1    1889968 ide/host0/bus0/target0/lun0/part1 10 18 98 70 2 0 16 0 0 70 70
   3     2    5670000 ide/host0/bus0/target0/lun0/part2 1248 5063 50488 10560 1543 10464 98304 26820 0 15610 37640
   3     3   52489080 ide/host0/bus0/target0/lun0/part3 1679010 144094 14584366 8004540 453808 316033 6159560 10954780 0 6823020 18959320
   3    64   43965432 ide/host0/bus0/target1/lun0/disc 1 3 8 20 0 0 0 0 -1 24638540 18311132
   3    65   26627706 ide/host0/bus0/target1/lun0/part1 0 0 0 0 0 0 0 0 0 0 0

Comment 21 Adam Hixson 2004-08-16 08:55:33 UTC

You asked where in the code the problem was, if you search lsraid.c for the line "/*FIXME: I'm lazy, can this be overrun? */" you can see the problem.

In answer to the lazy coder, I'd have to say yes.  Raising MAX_LINES is a temporary solution.  This bit should really be fixed.  At least /proc/partitions isn't a user writable file ... if it was this would be a proper security concern
as this is a classic c no-no.

Comment 22 Daniel Drake (RETIRED) gentoo-dev

2004-08-16 10:04:37 UTC

So it is the scanf call that causes the segfault?
Also, could you please post the /proc/partitions file as a text attachment to avoid any formatting being lost, if this matters.

I know that what you have posted already should be enough for me to look into the issue just if you have already done some of the work then there's not much point me doing it too :)
Also my system is non-RAID so I'm sort of working blind. I'm out now but will look into this another time.

Comment 23 Adam Hixson 2004-08-16 11:19:35 UTC

Created attachment 37546 [details]
/proc/partitions

Yeah, the scanf is the source of the segfault.	What happens is that for the
overlength lines, only the first MAX_LINE_LENGTH is read and scanf does its
thing.	Then on the next loop, the rest of the line is read in and scanf is
applied to it and it's obviously not formatted correctly and it dies then.  The
best thing to do
I think is to check the result from fgets to be sure that all of the line was
read in, and obviously do some sort of sanity check on the line to be sure
scanf won't barf.  A functional thing would be to increase MAX_LINE_LENGTH and
refer the matter to the author.  Here's my /proc/partitions for you to play
with.

Comment 24 Daniel Drake (RETIRED) gentoo-dev

2004-08-17 09:43:29 UTC

Created attachment 37610 [details, diff]
longline-fix.patch

Ok, please apply this patch and test out lsraid. I could successfully get a
segfault with your partitions file, and I think I have fixed it. Thanks.

Comment 25 Adam Hixson 2004-08-18 09:29:48 UTC

Created attachment 37668 [details, diff]
Fixes a few issues with previous patch.

I looked at your patch, and it's quite good.  There are just two issues that I
decided to clear up.

1.  I don't know if sizeof(char) is always one on every system, but just in
    case I always include it to ensure portability.  I modified your mallocs.
2.  It's possible that a line can be an exact multiple of MAX_LINE_LENGTH.
    In this exceedingly rare case your code will read in the next line as
    though it's part of the current line.  Since fgets stops on '\n', we can
    test to see if it finished or not by checking to see if the last character
    is a \n, as I've included.	Check me out here ... am I testing the correct
    position?  I've run some simple tests, but I'd like a second opinion.

Other than these issues, this is a great patch.

Comment 26 Daniel Drake (RETIRED) gentoo-dev

2004-08-18 11:30:23 UTC

That low-probability situation also occured to me earlier on (line length = MAX_LINE_LENGTH).. I'm not sure that your fix will solve the issue, I'll have a play and run some tests to satisfy my curiousity.

Either way, with our work applied, does lsraid work as expected?

Comment 27 Adam Hixson 2004-08-18 12:18:16 UTC

It does now work as expected.  Until the next bug ;)

Comment 28 Daniel Drake (RETIRED) gentoo-dev

2004-09-01 14:01:31 UTC

I decided against the dynamic buffer approach.. it doesnt seem right. Had a look through some other packages, they seem to just use a static buffer of sensible size. I've just released 1.00.3-r2 into ~, could you please test? I simply added a sed hack to the ebuild to set MAX_LINE_LENGTH to 1000.

Comment 29 Daniel Drake (RETIRED) gentoo-dev

2004-09-05 11:01:12 UTC

Bump..Adam, could you please test this to confirm?

Comment 30 Adam Hixson 2004-09-05 19:32:13 UTC

Sorry for the delay in testing.

Just updated, and seems to work fine.

Comment 31 Daniel Drake (RETIRED) gentoo-dev

2004-09-06 08:57:48 UTC

Thanks. -r2 is now marked stable in cvs. Will also report this issue to the raidtools author.