Bug 46981 - media-gfx/gnuplot-3.8j segfaults when trying to do a fit
Bug#: 46981 Product:  Gentoo Linux Version: unspecified Platform: All
OS/Version: Linux Status: RESOLVED Severity: normal Priority: P2
Resolution: FIXED Assigned To: g2boojum@gentoo.org Reported By: armin@despammed.com
Component: Applications
URL: 
Summary: media-gfx/gnuplot-3.8j segfaults when trying to do a fit
Keywords:  
Status Whiteboard: 
Opened: 2004-04-06 10:58 0000
Description:   Opened: 2004-04-06 10:58 0000
gnuplot segfaults after attempting to do a linear fit. The crash occurs while
the final results are printed.

Reproducible: Always
Steps to Reproduce:
1. start gnuplot
2. try to do fit [range] expression "filename" via parameters
3.

Actual Results:  
The fit is successful; the program crashes when displaying the results as seen
below:

After 5 iterations the fit converged.
final sum of squares of residuals : 0.00181103
rel. change during last iteration : -9.97883e-07

degrees of freedom (ndf) : 10
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 0.0134575
variance of residuals (reduced chisquare) = WSSR/ndf : 0.000181103

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

a               = -14.6228         +/- 0.673        (4.602%)
Segmentation fault


Expected Results:  
successful display of the fitting parameters.

repeating with "strace gnuplot" produces the following famous last words:

--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

the flags used when merging are:
CFLAGS="-pipe -O2 -finline -freorder-functions -freorder-blocks -ffast-math
-fomit-frame-pointer"

------- Comment #1 From Grant Goodyear 2004-04-06 14:57:30 0000 -------
Could you provide specific data and fitting commands so that I can try it
out here using exactly the same input?

------- Comment #2 From Armin 2004-04-06 15:11:29 0000 -------
Created an attachment (id=28809) [details]
sample data

attached is a sample data file (sample.dat)
steps:

gnuplot
fit [0.1:0.12] a*x+b  "sample.dat" using (1/($1)):(($3)**2) via a,b

at this point, the fit converges after 5 iterations and the gnuplot segfaults
after displaying the first line of the result:

[...]
 Final set of parameters	    Asymptotic Standard Error
======================= 	   ==========================

a		= -12.3508	   +/- 0.4276	    (3.462%)
Segmentation fault

------- Comment #3 From Grant Goodyear 2004-04-07 18:25:27 0000 -------
Thank you very much for the data and example, I really appreciate it.
(Incidentally, I've been using gnuplot for ten years or so, and I had
no idea that it did fits.  Cool!)

My guess is that it's your CFLAGS, but I can't reproduce it.  Even w/
your CFLAGS gnuplot seems to work on my machine, so it might be a dependency.

Nonetheless, would you mind remerging w/ 
CFLAGS="-O2 -mcpu=i686 -fomit-frame-pointer"
and seeing if that works?  

If that fails, the next step is to ask you to either strace gnuplot or
run it from gdb, either one of which should let us know where it's
segfaulting.

Thanks!

> gnuplot
 
        G N U P L O T
        Version 3.8j patchlevel 0
        last modified Wed Nov 27 20:49:08 GMT 2002
        System: Linux 2.6.3-gentoo-r1
 
        Copyright(C) 1986 - 1993, 1999 - 2002
        Thomas Williams, Colin Kelley and many others
 
        This is a pre-version of gnuplot 4.0. Please refer to the documentation
        for command syntax changes. The old syntax will be accepted throughout
        the 4.0 series, but all save files use the new syntax.
 
        Type `help` to access the on-line reference manual
        The gnuplot FAQ is available from
                http://www.gnuplot.info/faq/
 
        Send comments and requests for help to <info-gnuplot-beta@dartmouth.edu>        Send bugs, suggestions and mods to <info-gnuplot-beta@dartmouth.edu>
 
 
Terminal type set to 'x11'
gnuplot> fit [0.1:0.12] a*x+b "sample.dat" u (1/($1)):(($3)**2) via a,b
 
 
Iteration 0
WSSR        : 14.761            delta(WSSR)/WSSR   : 0
delta(WSSR) : 0                 limit for stopping : 1e-05
lambda    : 0.711307
 
initial set of free parameter values
 
a               = 1
b               = 1
/
 
Iteration 1
WSSR        : 0.114571          delta(WSSR)/WSSR   : -127.837
delta(WSSR) : -14.6464          limit for stopping : 1e-05
lambda    : 0.0711307
 
resultant parameter values
 
a               = 0.887916
b               = 0.110179
/
 
Iteration 2
WSSR        : 0.0831861         delta(WSSR)/WSSR   : -0.377286
delta(WSSR) : -0.0313849        limit for stopping : 1e-05
lambda    : 0.00711307
 
resultant parameter values
 
a               = -0.454838
b               = 0.230027
/
 
Iteration 3
WSSR        : 0.00212404        delta(WSSR)/WSSR   : -38.1641
delta(WSSR) : -0.0810621        limit for stopping : 1e-05
lambda    : 0.000711307
 
resultant parameter values
 
a               = -11.3808
b               = 1.42104
/
 
Iteration 4
WSSR        : 0.00158143        delta(WSSR)/WSSR   : -0.343113
delta(WSSR) : -0.00054261       limit for stopping : 1e-05
lambda    : 7.11307e-05
 
resultant parameter values
 
a               = -12.3499
b               = 1.52669
/
 
Iteration 5
WSSR        : 0.00158143        delta(WSSR)/WSSR   : -2.69974e-07
delta(WSSR) : -4.26945e-10      limit for stopping : 1e-05
lambda    : 7.11307e-06
 
resultant parameter values
 
a               = -12.3508
b               = 1.52679
 
After 5 iterations the fit converged.
final sum of squares of residuals : 0.00158143
rel. change during last iteration : -2.69974e-07
 
degrees of freedom (ndf) : 15
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 0.0102678
variance of residuals (reduced chisquare) = WSSR/ndf : 0.000105429
 
Final set of parameters            Asymptotic Standard Error
=======================            ==========================
 
a               = -12.3508         +/- 0.4276       (3.462%)
b               = 1.52679          +/- 0.04668      (3.057%)
 
 
correlation matrix of the fit parameters:
 
               a      b
a               1.000
b              -0.999  1.000

------- Comment #4 From Armin 2004-04-07 20:42:19 0000 -------
Can't remerge with -mcpu=i686 on amd64, sorry (the toolchain is 64bit only by
default and I don't have enough time to get a cross-compiler up and running).
Besides, using the same gnuplot version on 32bit x86 does not crash (at least,
it hasn't for me for anything).

The point is, it's marked as 'stable' on amd64. If this is indeed a 64bit
issue, is it possible to verify whether anything similar appears on ia64? (as
that is also marked stable).

a backtrace from gdb shows the following:

Program received signal SIGSEGV, Segmentation fault.
0x0000002a9663b936 in strnlen () from /lib/libc.so.6
(gdb) bt
#0  0x0000002a9663b936 in strnlen () from /lib/libc.so.6
#1  0x0000002a9661030f in vfprintf () from /lib/libc.so.6
#2  0x00000000004146c8 in init_color ()
#3  0x00000000004126d9 in init_color ()
#4  0x0000000000414566 in init_color ()
#5  0x000000000040ac6e in init_color ()
#6  0x000000000040a85f in init_color ()
#7  0x000000000040a74a in init_color ()
#8  0x0000000000435fff in matherr ()
#9  0x0000002a965e08b1 in __libc_start_main () from /lib/libc.so.6
#10 0x00000000004049aa in ?? ()

------- Comment #5 From Grant Goodyear 2004-04-08 05:15:41 0000 -------
Oh!  I'm sorry, I didn't realize you were using amd64.  (In fact, reading back
through your bug report I don't see this information anywhere, although I
could have missed it.)

I can't test it here, so I'm going to reassign this bug to the amd64 folks and
I'll cc the ia64 team as well.

Thanks.

------- Comment #6 From Danny van Dyk (RETIRED) 2004-04-08 06:57:42 0000 -------
I can confirm this bug on my amd64 box. Following the output of nonstripped
backtrace (same command and sample data as Armin provided):

*cut off*
After 5 iterations the fit converged.
final sum of squares of residuals : 0.00158143
rel. change during last iteration : -2.69974e-07

degrees of freedom (ndf) : 15
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 0.0102678
variance of residuals (reduced chisquare) = WSSR/ndf : 0.000105429

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

a               = -12.3508         +/- 0.4276       (3.462%)

Program received signal SIGSEGV, Segmentation fault.
0x0000002a9662bcb6 in strnlen () from /lib/libc.so.6
(gdb) bt
#0  0x0000002a9662bcb6 in strnlen () from /lib/libc.so.6
#1  0x0000002a966004aa in vfprintf () from /lib/libc.so.6
#2  0x0000000000414e4a in Dblfn ()
#3  0x0000000000412ba4 in regress ()
#4  0x00000000004145fb in fit_command ()
#5  0x000000000040af45 in command ()
#6  0x000000000040ab2c in do_line ()
#7  0x000000000040aa5d in com_line ()
#8  0x0000000000436e45 in main ()
(gdb)

I'll have a look on that later...

------- Comment #7 From Armin 2004-04-08 09:37:05 0000 -------
compiled the last available build on sf.net:

        G N U P L O T
        Version 3.8k patchlevel 3
        last modified Mon Mar 29 15:17:53 CEST 2004

crashes in the same place. Here's a backtrace on the debug build:

(gdb) bt
#0  0x0000002a9663b936 in strnlen () from /lib/libc.so.6
#1  0x0000002a9661030f in vfprintf () from /lib/libc.so.6
#2  0x00000000004198f9 in Dblfn (fmt=0x4d3268 "%-15.15s = %-15g  %-3.3s %-12.4g (%.4g%%)\n") at fit.c:1688
#3  0x00000000004173d8 in regress (a=0x648490) at fit.c:777
#4  0x0000000000419773 in fit_command () at fit.c:1638
#5  0x000000000040c7ec in command () at command.c:511
#6  0x000000000040c3ae in do_line () at command.c:368
#7  0x000000000040c28d in com_line () at command.c:327
#8  0x000000000044fcb7 in main (argc=1, argv=0x7fbffff398) at plot.c:626

It happens as it tries to write the last output line (a= [...] ) to the log file. Could this actually be a libc bug (as vprintf fails to print to a file a line that successfully printed to stdout)?

------- Comment #8 From Aron Griffis (RETIRED) 2004-04-08 11:04:44 0000 -------
Nice backtrace.. :-)  How about working on a patch?

------- Comment #9 From Danny van Dyk (RETIRED) 2004-04-08 11:45:41 0000 -------
Ok, uncommenting vfprintf(log_f, fmt, args); in src/fit.c:Dlbfn() solves the
segfault. Probably a sizeof() problem with log_f on 64bit Archs ?

------- Comment #10 From Danny van Dyk (RETIRED) 2004-04-08 18:28:30 0000 -------
Works for me now. The problem was less using vfprintf on a file but more
calling
it without re-initializing "args" via VA_START. Made a patch against it + a
patch
to the ebuild.

------- Comment #11 From Danny van Dyk (RETIRED) 2004-04-08 18:30:27 0000 -------
Created an attachment (id=28922) [details]
Patch for media-gfx/gnuplot-3.8j

------- Comment #12 From Danny van Dyk (RETIRED) 2004-04-08 18:31:39 0000 -------
Created an attachment (id=28923) [details]
Patch for media-gfx/gnuplot-3.8j ebuild

------- Comment #13 From Armin 2004-04-08 19:47:12 0000 -------
It does not seem necessary to remove printing to the log. Indeed, doing 

va_end(args);
VA_START(args, fmt);

between the two vprintf() calls seems to take care of it.

------- Comment #14 From Armin 2004-04-08 20:12:50 0000 -------
after looking at the c99 standard, this seems to be a gnuplot bug rather than
an amd64 one. specifically, 7.15 par. 3 says that the va_list argument (args)
is supposed to be in an invalid state after the first vfprintf() call. So maybe
make the patch required for all archs until a fix is available upstream?

also posted as a bug on the gnuplot sf.net bugtracker here:

http://sourceforge.net/tracker/index.php?func=detail&aid=932162&group_id=2055&atid=102055

------- Comment #15 From Travis Tilley (RETIRED) 2004-04-08 21:03:31 0000 -------
I've committed an amd64-specific ebuild for this to CVS. if this fix needs to
exist on all archs, then I'll leave that decision up to somebody other than
myself.

I'm re-assigning this bug to the maintainer for this package, but it should be
fixed on amd64 now. give it a bit to reach rsync.

------- Comment #16 From Grant Goodyear 2005-01-09 07:48:11 0000 -------
Closing, since there's been no recent input.

------- Comment #17 From Grant Goodyear 2005-01-09 07:48:48 0000 -------
Really closing this time.