Summary: | emerge of gcc-3.2.1-r7 won't go to completion | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Guy <gcadieux> |
Component: | [OLD] GCC Porting | Assignee: | Martin Schlemmer (RETIRED) <azarah> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | docs-team, jrray |
Priority: | High | ||
Version: | 1.4_rc2 | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | 14647 | ||
Bug Blocks: | |||
Attachments: | gcc.msgs.text.gz |
Description
Guy
2003-01-19 20:13:04 UTC
Aaaack! Can't add an attachment through lynx! From the screen: Sandbox error : urity/CodeSource.java environmental variable should be defined. ACCESS DENIED open_rd: /var/tmp/portage/gcc-3.2.1-r7/work/gcc-3.2.1/libjava/java/security/cert/Certificat.java jc1: Permission denied: can't reopen /var/tmp/portage ... (above file) continues with regular error messages. emerge info: portage 2.0.46-r9 (default-x86-1.4 gcc-3.2.1 glibc-2.3.1-r2) CHOST="i586-pc-linux-gnu", CFLAGS="-march=i586 -Os -pipe" (note) I've tried various combinations of K6, O2, O3. They've all failed. :-( Unfortunately, I did not note exactly where each combination failed. That should not happen. Try: # FEATURES=-sandbox emerge gcc Robert, if you want to have a look. This is not the only one, seems like some guys get segfaults. Also with -r6 and -r1/0 I think. Martin, you're correct. Also had this on -r6. Also got segfaults on other machine on -r6. Sorry I didn't post where it broke on other machine last night. Was tired and needed sleep. Will do so after I get home from work. Will try your suggestion tonight on these two machines after I get home (after posting the other stuff from machine 2). Thanx. Sandbox error : urity/CodeSource.java environmental variable should be defined. ACCESS DENIED open_rd: /var/tmp/portage/gcc-3.2.1-r7/work/gcc-3.2.1/libjava/java/security/cert/Certificat.java This message is unsettling. static void init_env_entries(char*** prefixes_array, int* prefixes_num, char* env, int warn) { int old_errno = errno; char* prefixes_env = getenv(env); if (NULL == prefixes_env) { fprintf(stderr, "Sandbox error : the %s environmental variable should be defined.\n", env); 'env' is passed as an arg to this function, this function is only called in one place, called for times with the env arg set to one of "SANDBOX_DENY", "SANDBOX_READ", "SANDBOX_WRITE", or "SANDBOX_PREDICT". For 'env' to end up with the value of "urity/CodeSource.java" means there is some serious memory trashing going on somewhere. Am testing Martin's suggestion now (finally! Nothing like all day meetings to trash one's good intentions) FWIW, Do not discount the possibility of memory going bad on machine 1. After this attempt to emerge gcc stops, I will try to emerge memtest86. On machine 2, this machine has been running fine with no problems for some time. I will arrange to do memtest86 on it as well. Results as soon as I get them. machine 2: from the screen var/tmp/portage/gcc-3.2.1-r7/work/gcc-3.2.1/libjava/java/rmi/server/RMIClassLoader.java:95: Class 'MalforMedURLException' not found in 'throws'. throws MalforMedURLException, ClassNotFoundException 2 errors ... Function src_compile, line 293, exit code 2 ... # emerge info Portage 2.0.46-r9 (default-x86-1.4, gcc-3.2.1, glibc-2.3.1-r3) ================================================================= System uname: 2.4.20 i686 Celeron (Mendocino) GENTOO_MIRRORS="http://www.ibiblio.org/pub/Linux/distributions/gentoo" CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config /usr/X11R6/lib/X11/xkb /usr/kde/3.1/share/config /usr/kde/3/share/config /usr/share/config" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" PORTDIR="/usr/portage" DISTDIR="/usr/portage/distfiles" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR_OVERLAY="" USE="x86 oss 3dnow apm avi crypt cups encode gif gpm jpeg libg++ libwww mikmod mmx mpeg ncurses nls pdflib png qtmt quicktime spell truetype xml2 xmms xv zlib alsa gdbm berkdb slang readline arts bonobo svga java guile X sdl tcpd pam ssl perl python esd imlib oggvorbis gnome gtk qt kde motif opengl mozilla" COMPILER="gcc3" CHOST="i686-pc-linux-gnu" CFLAGS="-march=pentium2 -O3 -pipe" CXXFLAGS="-march=pentium2 -O3 -pipe" ACCEPT_KEYWORDS="x86 ~x86" MAKEOPTS="-j1" AUTOCLEAN="yes" SYNC="rsync://rsync.gentoo.org/gentoo-portage" FEATURES="sandbox ccache" ---------------------- I'm not sure what other information I can send you which could be useful. Let me know and I will attach it. I'll save the work directory to another name before I re-run the gcc emerge on this machine with the sandbox disabled. var/tmp/portage/gcc-3.2.1-r7/work/gcc-3.2.1/libjava/java/rmi/server/RMIClassLoader.java:95: Class 'MalforMedURLException' not found in 'throws'. throws MalforMedURLException, ClassNotFoundException Is this a cut & paste? It should be 'MalformedURLException' and not 'MalforMedURLException'. The file and line given has a lowercase 'm' here. Assuming yours really does have an uppercase 'M', the error is legitimate, and the question is ... how did that 'M' get there? First - to answer your question J., it is not a cut and paste (I had been running as console) so it's a rather tedious, but exact copy, created by repeatedly switching bewteen an Xwindows session and the original console login. I took extra pains to ensure fidelity of copy. And yes, I thought the 'M' should have been 'm' myself. I'm re-running Martin's suggestion on machine one as I forgot to turn my swap partition on. :-( [sigh] Machine 2 completed emerge of gcc-3.2.1-r7 with FEATURES="-sandbox" without a hitch. Machine 1 is still running the emerge. It's at the point of compiling from directory (?) libstdc++-v3 program locale.cc. I don't know for sure if this is further than where is borked out originally. I'll check it in the AM as it's bedteim for me now. In the meantime, I have another machine (let's call it machine 3) which has successfully emerged gcc-3.2.1-r6 and is well on it's way to finishing the updated emerge of gcc-3.2.1-r7. Machine 1: K6 PR200, 64 meg ram, 384 meg swap. BORKED (-sandbox being tested) Machine 2: Celeron 466, 256 meg ram, 128 meg swap. BORKED (-sandbox worked) Machine 3: Pentium Classic 166, 96 meg ram, 384 meg swap. no problem (-r6 to -r7) Machine 4: K6 300, 48 meg ram, 512 meg swap. no problem (-r6 to r7) I wouldn't draw any conclusions with this limited dataset. But it is suggestive of an issue with some CPUs and sandbox. This may also be an important point: Machine 1 is in the middle of an install and is therefore running the 1.4_rc2 live cd kernel. If you want me to collect more info, I'll be happy to do so. Just let me know what. 'Night! Here is bug where ld segfaults when linking libgcj: http://bugs.gentoo.org/show_bug.cgi?id=14142 Only bad thing, is I cannot recreate it, so no way to try and debug anything. OK: Machine 1: FEATURES="-sandbox" CFLAGS="-march=k6 -Os -pipe" Borks in libjava FEATURES="-sandbox" CFLAGS="-mcpu=i586 -O2 -pipe" Borks in libjava (different location) Note: NO APPARENTLY FUNNY MEMORY ERRORS! :-) 1) To my non-programmer eyes, under some circumstances (yet to be fully defined) there appears to be a problem with memory in sandbox. (IE Machine 2 completed successfully without sandbox and Machine 1's messages are much more 'reasonable'. Unwanted, but reasonable.) 2) Machine 1 has additional problems over and above the potential memory problem(s) with sandbox. I've got some experimental ideas I want to try out on machine 1. Unfortunately, this will take some time. One of them involves possibly using a different CPU. I have 2 CYRIX PR200s, an IBM PR200 and a couple of Pentium Classics I can pop into this motherboard. Obviously, I want to see if this specific CPU is a possibility. The other involves running the Gentoo install to completion enough to reboot with a kernel specifically built by and for this CPU. This machine is the only machine I've had so far of which the kernel on the gentoo iso does NOT identify and load the correct nic driver (8139too). It loads the aironet modules instead. I realize this is a stretch, but in the spirit of leaving no stone unturned ... I'd suggest that perhaps you want to concentrate on 1) and let me worry about 2) for now until I have more information. At your convienence, I have a low use machine I can test any sandbox (or whatever) changes. Machine 2 is used only once or twice a week (it's a guest machine). Given that it borks with sandbox repeatedly and doesn't bork without sandbox, this is a good machine for such testing. If you can think of anything else to try with Machine 1, I'm certainly open to suggestions. I'll post the messages from the gcc compiles on Machine 1 after I get back home later today. Finally, I did make a bootable CD with memtest86 and it reports that my ram is fine. Um - My current schedule is: to post Machine 1's error messages. to finish configuring the desktop for Machine 3 and put Machine 3 away for now. to put Machine 4 away for now. to bring Machine 1 to the front of my workbench so I can start tinkering as noted above. I'm beginning to wonder if I should just use their actual names. :-) Anyway. I hope all this helps! Hehe, ok. Ill have a look at the sandbox stuff again and look if I can see a possible memory leak/whatever. Also maybe try that k6 with diff -march ? Like -march=i586, or such ? Guy, have a look at bug #14142 again .. his problems was due to no swap ... Martin, that was the first thing I thought of too when I saw his comment. :-) I didn't get a chance to do anything I wanted to last night because I didn't get in till after my bed time. However, I think this is a much better lead to follow than what I was going to do. It occurred to me as I thought about it more, that there are a lot of things that a total available memory shortage would exactly account for. This includes things like the ebuild aborting in different places on the same machine, Why two machines with the same amount of total memory (ram & swap) would behave differently (1 fail the other succeed) etc. I suspect that my figure for total memory is on the borderline of the minimum requirement to compile gcc successfully. What I hope to do tonight is: 1) Confirm the ram and swap currently available for each machine. 2) Start the Gentoo install of machine 1 over with a larger swap. 3) Pop in another hard drive in machine 2 and create a second swap and retest with sandbox on. It also occured to me that a machine with the 1.4 install iso would require more total memory than was previously required with the 1.2 and earlier installs because the iso image itself probably pulls out some of the swap for it's own uses. If this is the case (and I feel really confortable that we're on the right track at last), then I can give you a good working figure for the minimum total memory (ram & swap) required for completion of gcc. I don't know if this is somthing you'll be able to test for in the ebuild, but at least you'll be able to ask people what total memory they have when they report problems. :-) One of the things I was wondering is if the total number of objects compiled in gcc has been going up with each -r version. I ask only because I had been watching the successful machine 2 gcc emerge (w/out sandbox) and I didn't recall seeing a lot of the stuff there, especially towards the end. Is this the case? The other thing I was wondering was if the gcc ebuild was the 'largest' ebuild in terms of memory requirements. It's always been my impression that this is the case with perhaps open office being second and mozilla being third. Do you have any feel for this yourself? Despite the time requirements, I'll build open office (as opposed to open office bin) on one of these machines if you feel open office actually requires even more total memory. Ultimately, I'd like to give you a fairly hard minimum total memory requirement that you can incorporate where needed. Well, here's to hoping we're really on the right track. ;-) Created attachment 7595 [details]
gcc.msgs.text.gz
Martin, I added 128meg to the swap partition in machine 1. Total size (ram &
swap) is 640 megs. Now, I get a segmentation fault. -bleh-
I started ssh and pulled the file resulting from 'emerge gcc &> gcc.msgs.text'
so that I could upload the attachment.
I've left the machine up with ssh running. If it would help you to access the
machine directly, I can email you the password (living dangerously here!) and
current internet ip address. And if it would make you more comfortable, I can
set up iptables on each of the other machines to ignore that machine's ip
address.
Let me know what you want to do or what other information you want from that
machine. Frankly I'm so stumped I can't even speculate anymore. ;-)
Machine 1 still bork with sandbox on or off ? Tonight is a bit difficult (same as yesterday, but should be ok tomorrow for checking it out live. If its still going to be up for the next 36 hours, mail me the ip + u/p if you do not mind. This was with sandbox. I'll kick off one without in a little while and save the msgs to another file (in /root after you chroot to it) I've got other machines (486's) to start playing with. It's time I learned how to use distcc anyway. Updated info: I've verified the problem with machine 2 that it had insufficient total memory (RAM & SWAP) to emerge gcc to completion. I did this by putting in a second disk drive and creating a new (larger) swap partition. With sandox, the emerge went through to completion with no problem. I've identified the problem with machine 1 and I'm currently running the acid test to finally verify the problem. The problem appears to lie in the K6 PR200 processor. There is a known bug where memory becomes unreliable if there is more than 32 megs of ram available to this processor. These are the K6 CPUs of stepping 'B' (and earlier?). You can view a write up here: http://membres.lycos.fr/poulot/k6bug.html. From the web page: ================================================================================= The AMD-K6 processor has a bug that prevents reliable operation when more than 32 MB of RAM is used. The most common symptoms are segmentation violations (see The SIG11 FAQ) while compiling the Linux kernel. It can be reproduced, up to now, only when doing heavy compilations, probably because only compilations stress the system enough. It is not a gcc problem, as it is sometimes the program that launches gcc (it can be make, or sh) that dies. This bug has been seen by many people all around the world. The general consensus among them is that the bug only depends on the amount of memory used : * With 32 MB of RAM, or less, no problem. * With more than 32 MB, sporadic compile failures have been observed. According to AMD : * this bug is documented in section 2.6.2 of the AMD-K6 MMX Enhanced Processor Revision Guide * it has been corrected in later chips produced in the B stepping. * current machines are shipped with the corrected chips. =============================================================================== Read the web page for the rest of the write up and also for supplemental links including to the AMD-K6 MMX Enhanced Processor Revision Guide mentioned above. For now, I'm changing the severity to 'normal' as it looks very much like identification of these two problems constitute the 'fix'. I'll have conformation in the AM (EST). Will post results here. Note 1: I added bug 14647 which is a documentation change suggestion regarding reccomended swap space in the installation guide. Note 2: Booting correctly identifies this CPU and the appropriate messages are displayed in 'dmesg'. However, the web link report is out-of-date. Very interesting information, thanks for hunting that down! You're welcome. :-) And gcc did complete this AM (as expected) with 32 megs of RAM and 608 megs of swap. :D So there are several results that I get from this leeetle episode: 1) There is nothing wrong with sandbox under even extreme circumstances. (Yeah!) 2) CFLAGS="-march=k6 -Os -pipe" works fine for emerging gcc 3) To be on the safe side, installation of Gentoo should be done with a reccomended total memory (RAM + SWAP) of 640 megs. 4) K6 PR200 CPUs with more than 32 megs of ram should be specifically asked about when smoeone complains of 'segfaults' especially during the emerge of gcc. -heh- 5) You can install Gentoo on systems with as little as 32 megs of ram even without going through distcc - provided you have lots of time and swap. My previous low was 48megs. Martin, I'll leave this for you to close since, as far as I'm concerned, the issues are all resolved. As I mentioned in my email though, adding a CPU and memory check to the checking phase of the ebuild might be a good idea. If you do so, I can probably be talked into increasing the memory in machine 1 specifically to test such a functional check. (no need, I think, to test the entire ebuild!) BTW - the resulting text file from 'emerge gcc &> gcc.msgs.32m.text was 5megs+. Was I ever glad to see the bottom of that! Martin, I forgot! (me bad!!) Thanx for both your time and patience. :D Hi guys, have a look at comment #21: ------------------------------------------------------------------------ 3) To be on the safe side, installation of Gentoo should be done with a reccomended total memory (RAM + SWAP) of 640 megs. ------------------------------------------------------------------------ Guy, great work thanks! As far as I can tell, this is not a problem for me any more. ;-) Yep, but we need docs updated to recommend a 512/640 minumum ram+swap. I realise that - in fact, I added an enhancement request for adding that to the Installation Guide. (bug 14647). ;-) I wasn't suggesting this be closed. Rather, I've been going through all the open bugs I've authored or joined and indicating where I stood on each one. This way, each respective developer knows whether I still have an open issue regarding same. There were 4 or 5 total where my issues are resolved. -hehe- I was trying to do you guys a favor and let you know whether to expect anything else from me. Ah, did not know you added a new bug =) very interesting thread, but I doesn't help me to solve the almost same problem with my k6 here (according to cpuinfo stepping 12), 530Mhz and 180MB RAM and 500MB SWAP, so the ram+swap is bigger than 600MB but there is still the issue with k6 & more than 32MB ram. any way to ship around ? I dunno if this stepping 12 processor is one of the faulty ones and even if, I don't have any 32MB memory modules anyway. I got a working system one time, but then after the reboot he couldn't mount the XFS partition. so I am trying it again and I get the same glibc [no problem with gcc compiling thought] again and again. make[2]: *** [/var/tmp/portage/glibc-2.3.1-r2/work/glibc-2.3.1/buildhere/sunrpc/xbootparam_prot.stmp] Illegal instruction |