Bug 61042 - dev-lang/R crashs when compiled with dev-lang/f2c (sane w/ g77)
dev-lang/R crashs when compiled with dev-lang/f2c (sane w/ g77)
 Status: RESOLVED FIXED None Gentoo Linux Unclassified New packages (show other bugs) AMD64 Linux High normal (vote) Gentoo Science Related Packages

 Reported: 2004-08-20 08:28 UTC by ivo welch 2004-09-12 13:51 UTC (History) 5 users (show) config gcc-porting kugelfang morfic toolchain ---

Attachments

 Note You need to log in before you can comment on or make changes to this bug.
 ivo welch 2004-08-20 08:28:15 UTC I have an up-to-date gentoo distribution on amd64. I first grabbed f2c (by hand, because the script does not know to install it), then ran the emerge on R . the version of R is not the most recent. Worse, it crashes randomly---not on startup, but later. these are not rare crashes, but pretty systematic. so, this version of R should be hard-masked at least for the combination of amd64 and gcc 3.3.3 . Reproducible: Always Steps to Reproduce: 1. 2. 3. I have dropped an email to the R people about this issue. now, I know that at least one person had a stable R under another amd64 (fedora) distribution. Let's hope I find out what's going on---in which case I will report back. ivo welch 2004-08-20 12:39:31 UTC it appears (email from one of the R wizards) the problem is that f2c is not really compatible with R anymore. this would not be so bad were it not for the case that there does not appear to be a g77 build for gentoo: $emerge search g77 so, right now, R is not really buildable.  Danny van Dyk (RETIRED) 2004-08-20 12:55:55 UTC well, you both managed to give me almost none information to work on beside: It's something with R. _0_) What version of dev-lang/R do you mean ? PLEASE, the next time you file a bug, please tell us which version you are talking about ! 1) You don't need f2c, you can use g77 to compile the fortan things. Did you try to use g77 ? Do those error still show up when using a g77 compiled R ? 2) Why haven't you provided the output of "emerge info" ? ivo welch 2004-08-20 13:14:32 UTC hi danny: the both of us was both of me. fortunately, it would not have been useful to tell you the R version or to provide the emerge info (or other information), because I now know that the fault lies in the underlying compiler I used---f2c. with g77, most likely the R errors will disappear. (until we have a g77 compiler, we should hardmask R. we should certainly hardmask it against f2c builds---it is definitively incompatible with it. it will build, and appear ok, but R itself will randomly crash thereafter.) now, here is where I remain confused: where do you get a g77 compiler from? gentoo does not seem to have it.$ emerge search g77 does not give me anything. I would love to find out---I am tearing my hairs out trying to learn how to get a g77 to run under gentoo. gcc compiler seems to be a semi-black art. help would be highly appreciated. /iaw  Danny van Dyk (RETIRED) 2004-08-20 14:12:28 UTC g77 is no seperate package, it is part of gcc USE="f77" emerge gcc will give you what you want. No need to hardmask dev-lang/R. No need to teat your hair, either ;-) You still haven't provided the output of "emerge info". Please do that. I'd further like you to tell me if the errors only occur w/ a specific version of dev-lang/R, or with all of them. ivo welch 2004-08-20 17:45:33 UTC thank you, danny for the info. first, an aside: in my attempts to update, I had started with $emerge /usr/portage/sys-devel/gcc/gcc-3.4.1.ebuild which worked nicely, but got me up to the later and thus less conventional gcc version. (I do not think it makes any difference. I did try to unmerge 3.4.1 to get back to 3.3.3, but this cannot be done apparently.) then I got your note, and so I tried the USE flags on f77 and g77 (see also my emerge info below). alas, neither g77 nor f77 is in /usr/portage/profiles/use.desc . instead, a grep thereon tells me that$ grep 77 /usr/portage/profiles/use.desc /usr/portage/profiles/use.desc:ifc - use ifc instead of g77 to build of course, I am not sure if an intel compiler is a great idea for an amd64 architecture---and I wonder if R would like ifc, either. so, I am still perplexed about how to build R... :-( . any advice/ideas would be appreciated. regards, /iaw Portage 2.0.50-r9 (default-amd64-2004.2, gcc-3.4.1, glibc-2.3.4.20040605-r0, 2.6.7-gentoo-r14) ================================================================= System uname: 2.6.7-gentoo-r14 x86_64 4 Gentoo Base System version 1.4.16 distcc 2.12.1 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] Autoconf: sys-devel/autoconf-2.59-r3 Automake: sys-devel/automake-1.8.3 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CFLAGS="-pipe -O2" CHOST="x86_64-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-pipe -O2" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache" GENTOO_MIRRORS="ftp:///ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://mirror.datapipe.net/gentoo http://mirror.datapipe.net/gentoo ftp://mirrors.sec.informatik.tu-darmstadt.de/gentoo/" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X aalib alsa amd64 apm arts avi berkdb bitmap-fonts bonobo cdr crypt cups directfb encode esd f77 foomaticdb gdbm ggi gif gnome gphoto2 gpm gtk gtk2 gtkhtml guile imlib java jpeg kde ldap libg++ libwww mikmod motif mozilla mpeg mysql nas ncurses nls nogcj oggvorbis opengl oss pam pdflib perl png postgres python qt quicktime readline ruby scanner sdl slang snmp spell ssl tcltk tcpd tetex truetype ungif usb xml2 xmms xv zlib"  Donnie Berkholz (RETIRED) 2004-08-20 21:41:24 UTC donnie@quasar donnie $grep 77 /usr/portage/profiles/use.* /usr/portage/profiles/use.desc:ifc - use ifc instead of g77 to build /usr/portage/profiles/use.local.desc:dev-lang/R:f77 - Use f77 to compile FORTRAN sources. /usr/portage/profiles/use.local.desc:sys-devel/gcc:f77 - Build support for the f77 language ivo welch 2004-08-21 06:15:30 UTC thank you, donnie. fortran was indeed already installed in the compiler tree. I failed to realize that there would be no "g77" or "f77" command line version (or symbolic link to it) built. so, when I had tried to invoke "$ g77", there was no such thing, and I had therefore thought fortran was not yet installed. On my earlier confusion, I have sent a "bug report" for a portage/emerge suggestion that such features (e.g., fortran compiler presence via a USE flag) should be textually noted in an "$emerge search" invokation towards the end as "see also USE flags: ... ". Because R is now at version 1.9.1, I grabbed it from cran, linked gcc symbolically to a name of g77 (necessary), and then compiled R. It worked absolutely flawlessly, and the R bugs I had experienced are no longer there. (I am still looking for end-user documentation how to prepare a package for submission. If it is not too difficult [I have never used cvs, so this may be a show stopper], I would be glad to contribute some.) R might also be a good candidate for making a binary build available. Is it possible to require R *NOT* to use f2c in the ebuild file? It is definitively not compatible with f2c. 100%. It will compile, but it will also bomb during use. (With g77, R is rock-solid.) thanks to both of you for your help. i would not have succeeded here without you. very highly appreciated. regards, /iaw  Danny van Dyk (RETIRED) 2004-08-21 14:43:24 UTC >Portage 2.0.50-r9 (default-amd64-2004.2, gcc-3.4.1, glibc-2.3.4.20040605-r0, >2.6.7-gentoo-r14) This is a BAD IDEA(tm). You do use gcc-3.4.x with the default profile on amd64. Please switch profile to default-linux/2004.2/gcc34 (Cascading profile) >thank you, donnie. fortran was indeed already installed in the compiler tree. Compiler tree ? >I failed to realize that there would be no "g77" or "f77" command line version >(or symbolic link to it) built. so, when I had tried to invoke "$ g77", there >was no such thing, and I had therefore thought fortran was not yet installed. You should have the possibility to run g77 from the comamnd line. >Because R is now at version 1.9.1, I grabbed it from cran, linked gcc >symbolically to a name of g77 (necessary), and then compiled R. It is _not_ necessary to link gcc to g77. g77 is a seperate executable that ships with gcc if you emerge it with "f77" in you USEflags. Thx for the hint on 1.9.1. I will do a version bump towards 1.9.1 then. >R might also be a good candidate for making a binary build available. Gentoo/AMD64 doesn't offer binary packages but those which are part of the install stages. Further, R has a lot of USEflags as you experienced yourself. Which combination should be set for compilation ? Not very useful in my eyes. >Is it possible to require R *NOT* to use f2c in the ebuild file? YES, it is. Just remove "f2c" from your USEflags and it will be compiled with g77. Is it possible that you give me a worksample which breaks on f2c compiled R and works with g77 compiled R ? I couldn't get it to crash on my system... Thx. ivo welch 2004-08-21 16:08:02 UTC >Portage 2.0.50-r9 (default-amd64-2004.2, gcc-3.4.1, glibc-2.3.4.20040605-r0, >2.6.7-gentoo-r14) This is a BAD IDEA(tm). You do use gcc-3.4.x with the default profile on amd64. Please switch profile to default-linux/2004.2/gcc34 (Cascading profile) --- I hope this bugzilla is google searchable, or else I am probably wasting your time with this. /etc/make.profiles is pointing to ../usr/portage/profiles./default-amd64-2004.2 I presume switching profiles means I should do an $ln -sf ../usr/portage/profiles/gcc34-amd64-2004.1 ./make.profile >thank you, donnie. fortran was indeed already installed in the compiler tree. Compiler tree ? >I failed to realize that there would be no "g77" or "f77" command line version >(or symbolic link to it) built. so, when I had tried to invoke "$ g77", there >was no such thing, and I had therefore thought fortran was not yet installed. You should have the possibility to run g77 from the comamnd line. --- the command line interface g77 was not automatically installed on my system, although the fortran compiler was installed when I reemerged the compiler with the f77 flag. the link from g77 to gcc did the job as far as I needed it. I believe the g77 is merely a simplified command line invokator for gcc. >Because R is now at version 1.9.1, I grabbed it from cran, linked gcc >symbolically to a name of g77 (necessary), and then compiled R. It is _not_ necessary to link gcc to g77. g77 is a seperate executable that ships with gcc if you emerge it with "f77" in you USEflags. Thx for the hint on 1.9.1. I will do a version bump towards 1.9.1 then. --- i wish i could help more. sofar, I am a consumer, not a producer. >R might also be a good candidate for making a binary build available. Gentoo/AMD64 doesn't offer binary packages but those which are part of the install stages. Further, R has a lot of USEflags as you experienced yourself. Which combination should be set for compilation ? Not very useful in my eyes. >Is it possible to require R *NOT* to use f2c in the ebuild file? YES, it is. Just remove "f2c" from your USEflags and it will be compiled with g77. Is it possible that you give me a worksample which breaks on f2c compiled R and works with g77 compiled R ? I couldn't get it to crash on my system... Thx. --- not easy. the main reason is that I have killed my R build already. however, I am positive on the subject. I also know why it may work for you, and not for me---apparently, f2c runs into problems on the amd64 platform because sizeof(int) != sizeof(long). it works just fine on most other platforms. if you look at the 1.9.1. installation manual at http://cran.r-project.org/doc/manuals/R-admin.pdf, it says "If you use f2c you may need to ensure that the FORTRAN type integer is translated to the C type int. Normally f2c.h' contains typedef long int integer;' which will work on a 32-bit platform but not on a 64-bit platform." so the error is really in f2c---f2c should be hard-masked for 64 platforms, until this is fixed by the f2c project. regards, /iaw  Danny van Dyk (RETIRED) 2004-08-22 09:01:04 UTC Well, some points: 1) Remove the symlink /etc/make.profile and relink it with ln -sf ../usr/portage/profiles/gcc34-amd64-2004.1 ./make.profile just as you said. 2) You say you don't have the executable g77 on your system ? I doubt that, because you simply can't use gcc to compile Fortran. You need g77 for this. Please remerge gcc by following command: USE="g77" emerge gcc and tell me the location of g77 on your system via "which g77". It should look like this: phi / # which g77 /usr/x86_64-pc-linux-gnu/gcc-bin/3.4/g77 3) You still haven't specified how to reproduce the "not rare, but pretty systematic" crashes. I can't help you if can't reproduce the crash. 4) f2c on 64bit works flawlessly for me. I checked the Suse srpm for f2c on alpha and x86_64. They both provide the very f2c.h as gentoo does, but they have a modified libf2c. 5) the "f2c"-project hasn't worked on f2c since Sat Oct 25 07:57:53 MDT 2003, the date of last change on their homepage. :-/ 6) *Please specify how to crash R*. The best way it to attach a script which does fail on your system. I can't work on it if i can't reproduce !!!!!! 7-\infty) see 3) and 6) ivo welch 2004-08-22 13:33:56 UTC thank you. on 2, I do not fully understand the relationship between g77 and f77 use flags. I had put both into my make.conf---just in case. No g77 emerged. When I do '$USE="g77" ; emerge gcc', then g77 Well, I spoke too soon---it is not exactly f2c that is broken---what is broken are the blas libraries that are commonly used, and which are definitely used inside R. do you have an f2c-built R on your system? if so, try this:$ R > x<- rnorm(1000); > y<- rnorm(1000); > lm( y ~ x ); if this does not SEGV fault, then repeat this again. if this does not die, either, then I will first uninstall my own R, then try to build R without f77/g77 support (can I do so with '$USE="-f77 -g77" emerge R' ?) and then hopefully (not) eat my words, but give you an account on my machine to see for yourself---or send you the gdb crash point. sorry for not having been more precise. gentoo is a learning process for me. Donnie Berkholz (RETIRED) 2004-08-22 20:18:33 UTC There is no 'g77' USE flag. I'm not quite sure where you got this idea, but the correct USE flag is f77. My earlier grep of use.local.desc and use.desc showed that the only USE flag containing '77' (other than ifc, which is irrelevant) is f77. Danny van Dyk (RETIRED) 2004-08-23 04:50:38 UTC I'm finally able to reproduce this. g77 compiler needs to be renamed, so that configure doesn't find it (BUG in dev-lang/R's USE flag handling) BACKTRACE: > x<- rnorm(1000); > y<- rnorm(1000); > lm( y ~ x ); Program received signal SIGSEGV, Segmentation fault. 0x0000000000531df1 in dnrm2_ (n=0x206d9e0, x=0x216c070, incx=0x1) at blas.c:1584 1584 blas.c: No such file or directory. in blas.c (gdb) bt #0 0x0000000000531df1 in dnrm2_ (n=0x206d9e0, x=0x216c070, incx=0x1) at blas.c:1584 #1 0x000000000051f11d in dqrdc2_ (x=0x216a130, ldx=0x216c070, n=0x206d9e0, p=0x206d9b0, tol=0x205b318, k=0x206d950, qraux=0x207e6f0, jpvt=0x205b2d8, work=0x1e6a5c0) at dqrdc2.c:133 #2 0x000000000051f89a in dqrls_ (x=0x216a130, n=0x206d9e0, p=0x206d9b0, y=0x216dff0, ny=0x206d980, tol=0x205b318, b=0x207e728, rsd=0x216ff70, qty=0x2171ef0, k=0x206d950, jpvt=0x1, qraux=0x40000000000003e8, work=0x1e6a5d8) at dqrls.c:137 #3 0x0000000000465ea9 in do_dotCode (call=0xfcc5d8, op=0x7fbfffd4d0, args=0x70ba18, env=0x40000000000003e8) at dotcode.c:1340 #4 0x000000000047c7b5 in Rf_eval (e=0xfcc5d8, rho=0x2042eb0) at eval.c:398 #5 0x000000000047dd67 in do_set (call=0xfcc4c0, op=0x729f80, args=0xfcc4f8, rho=0x2042eb0) at eval.c:1271 #6 0x000000000047c6ff in Rf_eval (e=0xfcc4c0, rho=0x2042eb0) at eval.c:375 #7 0x000000000047de2c in do_begin (call=0xfd2700, op=0x729d88, args=0xfcc450, rho=0x2042eb0) at eval.c:1046 #8 0x000000000047c6ff in Rf_eval (e=0xfd2700, rho=0x2042eb0) at eval.c:375 #9 0x000000000047eee4 in Rf_applyClosure (call=0xfd63b0, op=0xfd2fa0, arglist=0x2042258, rho=0x1fe2378, suppliedenv=0x70ba18) at eval.c:566 #10 0x000000000047c486 in Rf_eval (e=0xfd63b0, rho=0x1fe2378) at eval.c:410 #11 0x000000000047c6ff in Rf_eval (e=0xfd60a0, rho=0x1fe2378) at eval.c:375 #12 0x000000000047dd67 in do_set (call=0xfd6e10, op=0x729f80, args=0xfd6e48, rho=0x1fe2378) at eval.c:1271 #13 0x000000000047c6ff in Rf_eval (e=0xfd6e10, rho=0x1fe2378) at eval.c:375 #14 0x000000000047de2c in do_begin (call=0xfd69e8, op=0x729d88, args=0xfd6dd8, rho=0x1fe2378) at eval.c:1046 #15 0x000000000047c6ff in Rf_eval (e=0xfd69e8, rho=0x1fe2378) at eval.c:375 #16 0x000000000047c6ff in Rf_eval (e=0xfd80b8, rho=0x1fe2378) at eval.c:375 #17 0x000000000047de2c in do_begin (call=0xfdfd98, op=0x729d88, args=0xfd7f68, rho=0x1fe2378) at eval.c:1046 #18 0x000000000047c6ff in Rf_eval (e=0xfdfd98, rho=0x1fe2378) at eval.c:375 #19 0x000000000047eee4 in Rf_applyClosure (call=0x1fe1078, op=0xfe0360, arglist=0x1fe0f98, rho=0x749698, suppliedenv=0x70ba18) at eval.c:566 #20 0x000000000047c486 in Rf_eval (e=0x1fe1078, rho=0x749698) at eval.c:410 #21 0x0000000000493e6a in Rf_ReplIteration (rho=0x749698, savestack=0, browselevel=0, state=0x7fbfffed40) at main.c:250 #22 0x0000000000493f88 in R_ReplConsole (rho=0x749698, savestack=0, browselevel=0) at main.c:298 #23 0x00000000004941e2 in run_Rmainloop () at main.c:656 #24 0x000000000050031e in main (ac=34003424, av=0x216c070) at system.c:99 (gdb) q the interesing parts of the (generated) c-code: dqrdc2.c: (from src/appl/dqrdc2.f) #include "f2c.h" /* Table of constant values */ static integer c__1 = 1; [...] qraux[j] = dnrm2_(n, &x[j * x_dim1 + 1], &c__1) <-- Line 133, #1 from bt. You see, though the code says, dnrm2_'s second paramter shall be the address of static integer c__1, the function dnrm2_ gets the _value_ of c__1 instead. That seems to me rather like a compiler bug. Interestingly, if you typedef integer to be "int" instead "long int", it simply works. I'll CC morfic and lv (both gcc-3.4 ppl) to look at this too. lv, morfic: Have you ever seen anything like this ? Danny van Dyk (RETIRED) 2004-08-27 10:59:09 UTC lv gives up on this one. CC'ing toolchain and gcc-porting: Might this be a gcc bug ? Benjamin Schindler (RETIRED) 2004-09-06 12:38:59 UTC I found some additional info, which might be interesting (I'm in blas.c): (gdb) next 1570 if (*n < 1 || *incx < 1) { (gdb) p incx$2 = (integer *) 0x6bad80 (gdb) next 1572 } else if (*n == 1) { (gdb) p incx $3 = (integer *) 0x1 On another run, incx changed to 0x1 on line 1570 On yet another run, it happened a little later on. Inserting const in the function body might fix the issue maybe? Danny van Dyk (RETIRED) 2004-09-06 12:51:59 UTC Right, changing the third function argument from "integer *incx" to "const integer *incx" fixed it. However, this shouldn't happen. I asked lv already for a name of someone from the gcc developer team to CC on this bug. Benjamin and I think that this _is_ a gcc bug. Benjamin Schindler (RETIRED) 2004-09-07 02:32:35 UTC I've disassembled the function and did another run gdb run: p incx$1 = (integer *) 0x6bad80 (gdb) next 1570 if (*n < 1 || *incx < 1) { (gdb) p incx $2 = (integer *) 0x6bad80 (gdb) next 1567 --x; (gdb) p incx$3 = (integer *) 0x6bad80 (gdb) p &x Address requested for identifier "x" which is in register $rsi (gdb) p$rsi $4 = 35074264 (gdb) next 1570 if (*n < 1 || *incx < 1) { (gdb) p incx$5 = (integer *) 0x6bad80 (gdb) p x $6 = (doublereal *) 0x21730d0 (gdb) next 1572 } else if (*n == 1) { (gdb) p incx$7 = (integer *) 0x1 The assembler dump is at: www.n.ethz.ch/student/bschindl/asm What I think should be noted is that the app makes heavy use of the xmm0-2 registers, which are as far as I know mmx registers - right? Benjamin Schindler (RETIRED) 2004-09-07 02:49:07 UTC Now I've found something interesting - I've looked with kdbg what instructions the various statements resolve to: if (*n < 1 || *incx < 1) { <---- mov (%rdi, %rax) norm = 0.; <----- xorpd %xmm0, %xmm0 } else if (*n == 1) { <------ cmp \$0x1, %rax je 0x531e5d : jle 0x531e4c 0x0000000000531d44 : mov (%rdx),%rdx 0x0000000000531d47 : test %rdx,%rdx 0x0000000000531d4a : jle 0x531e4c It's important to note, that incx is stored in rdx. So what this code does, is move the value of the variable stored at rdx to rdx, thus overwriting the pointer. This is obvious, since *incx is 1 and after running the code, incx becomes 0x1. I've replaced the code with the following: mov (%rdx),%r8 test %r8,%r8 But it would still segfault, but with another fault. And I think the program expects incx to be at rdx, which it isn't anymore. My assembly skills are limited ;) So gcc-people, what's going on here? ;) Danny van Dyk (RETIRED) 2004-09-12 13:51:52 UTC Config and me finally found the reason. When R calls fortran functions from its c sources, it uses strictly ints for intgeres, not long ints. But f2c is perfectly right to use long ints on 64bit arches. The R ebuilds now check on both 64-bit and f2c and die when both are set. R-2.0 will check for this in its configure script.