Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 178980 - www-servers/tomcat-6.0.13-r1 fails to build because javac is oom
Summary: www-servers/tomcat-6.0.13-r1 fails to build because javac is oom
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Java (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Java team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-18 08:46 UTC by Johan Bergström
Modified: 2007-05-28 23:34 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Tomcat 6.0.13 compile on amd64 Opteron with sun-jdk-1.5.11 verbose:gc (opteron,6.16 KB, text/plain)
2007-05-25 15:36 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on amd64 Turion with sun-jdk-1.5.11 verbose:gc (turion,4.86 KB, text/plain)
2007-05-25 15:37 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 verbose:gc (x86,8.88 KB, text/plain)
2007-05-25 15:40 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on amd64 Opteron with sun-jdk-1.5.11 -Xmx128m verbose:gc (opteron-Xmx,4.04 KB, text/plain)
2007-05-25 17:41 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 -server verbose:gc (x86,8.86 KB, text/plain)
2007-05-25 18:12 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 verbose:gc (x86,11.85 KB, text/plain)
2007-05-25 18:16 UTC, William L. Thomson Jr. (RETIRED)
Details
Tomcat 6.0.13 compile on amd64 with sun-jdk-1.5.11 vm ecj compiler verbose:gc (opteron-ecj,3.69 KB, text/plain)
2007-05-25 21:00 UTC, William L. Thomson Jr. (RETIRED)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johan Bergström 2007-05-18 08:46:43 UTC
javac (sun's 1.5.0.11) seems to have a low heap limit for amd64 systems (mine's 1G and practically nothing running). How about raising the default values some? #gentoo-java was very helpful and pointed me to building it with ecj instead (which worked). If anyone else should bump into this; here's how to build with ecj: 'JAVA_PKG_FORCE_COMPILER="ecj-3.2" emerge -1 tomcat'

<snip>
compile:
    [javac] Compiling 954 source files to /var/tmp/portage/www-servers/tomcat-6.0.13-r1/work/apache-tomcat-6.0.13-src/output/classes
    [javac] 
    [javac] 
    [javac] The system is out of resources.
    [javac] Consult the following stack trace for details.
    [javac] java.lang.OutOfMemoryError: Java heap space

BUILD FAILED
/var/tmp/portage/www-servers/tomcat-6.0.13-r1/work/apache-tomcat-6.0.13-src/build.xml:89: Compile failed; see the compiler error output for details.

Total time: 19 seconds

!!! ERROR: www-servers/tomcat-6.0.13-r1 failed.
Call stack:
  ebuild.sh, line 1614:   Called dyn_compile
  ebuild.sh, line 971:   Called qa_call 'src_compile'
  environment, line 4869:   Called src_compile
  tomcat-6.0.13-r1.ebuild, line 68:   Called eant 'build-jasper-jdt' 'deploy' '-Dbase.path=/var/tmp/portage/www-servers/tomcat-6.0.13-r1/temp' '-Dcompile.debug=false' '-Dnobuild.docs=true' '-Dant.jar=/usr/share/ant-core/lib/ant.jar' '-Dcommons-daemon.jar=/usr/share/commons-daemon/lib/commons-daemon.jar' '-Djdt.jar=/usr/share/eclipse-ecj-3.2/lib/ecj.jar' '-Djsp-api.jar=/usr/share/tomcat-servlet-api-2.5/lib/jsp-api.jar' '-Dservlet-api.jar=/usr/share/tomcat-servlet-api-2.5/lib/servlet-api.jar'
  java-utils-2.eclass, line 1815:   Called die

!!! eant failed
!!! If you need support, post the topmost build error, and the call stack if relevant.
!!! A complete build log is located at '/var/tmp/portage/www-servers/tomcat-6.0.13-r1/temp/build.log'.

!!! When you file a bug report, please include the following information:
GENTOO_VM=sun-jdk-1.5  CLASSPATH="" JAVA_HOME="/opt/sun-jdk-1.5.0.11"
JAVACFLAGS="-source 1.5 -target 1.5" COMPILER="javac"

emerge --info
Portage 2.1.2.2 (default-linux/amd64/2006.1/desktop, gcc-4.1.1, glibc-2.5-r2, 2.6.19-gentoo-r5 x86_64)
=================================================================
System uname: 2.6.19-gentoo-r5 x86_64 Intel(R) Pentium(R) D CPU 2.80GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Fri, 18 May 2007 06:50:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
dev-java/java-config: 1.3.7, 2.0.31-r5
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distlocks metadata-transfer paralell-fetch sandbox sfperms strict userpriv"
GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo/"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="amd64 apache2 arts berkdb bitmap-fonts bzip2 cairo cdr cli cracklib crypt cups dbus dri dvd dvdr eds emboss encode fam firefox fortran gdbm gpm gstreamer hal iconv isdnlog java libg++ logrotate mad midi mikmod mpeg ncurses nls nogtk nptl nptlonly ogg oss pam pcre perl png ppds pppd python qt3 qt4 quicktime readline reflection session spell spl ssl symlink tcpd truetype-fonts type1-fonts unicode xml xv zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS


Reproducible: Always
Comment 1 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-18 13:43:07 UTC
Are you getting out of memory or out of heap space? They are not the same. Also I ran into this very early on but it is not reproducible always. Reboot your machine and before you start many apps and eat up your ram, compile Tomcat. I bet it will compile and install. It does for me and many others.

Java can be finiky on amd64 wrt to compiling applications. It's using a server compiler which does considerable more optimizations than a client compiler.

You can see I had this in the ebuild for a period of time. But only I ran into it and others did not seem to have issues. Since then I really haven't had any so dropped it. Look at versions <6.0.10. You will see commands and ecj stuff.

http://viewcvs.gentoo.org/viewcvs.py/gentoo-x86/www-servers/tomcat/?hideattic=0
Comment 2 Petteri Räty (RETIRED) gentoo-dev 2007-05-18 13:51:35 UTC
(In reply to comment #1)
>
> Java can be finiky on amd64 wrt to compiling applications. It's using a server
> compiler which does considerable more optimizations than a client compiler.
> 

The compiler doesn't do optimizations but the memory limits on amd64 are very low for some reason. On my x86 I get:
betelgeuse@pena ~/test/java $ java Mem
489MB
betelgeuse@pena ~/test/java $ java -server Mem
489MB

But he got:
10:51 <@Betelgeuse> Lfe: Out of curiousity check what you get with that
10:53 < Lfe> 81MB :o

Of course I have twice the memory as him but still:
10:59 < Lfe> jrockit 1.5 gives me 1024 as max mem on a 4G system
11:00 < Lfe> 1024MB sry

I think in the long term we should just set the memory limit to all available system memory in the eclass.
Comment 3 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-18 16:12:17 UTC
(In reply to comment #2)
>
> The compiler doesn't do optimizations 
http://java.sun.com/products/hotspot/whitepaper.html#server

This is something I have researched pretty extensively. I posted this link on the channel some time back :)

> but the memory limits on amd64 are very low for some reason.

Completely in accurate.
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_heap_defaults

All memory allocations for the server vm on 64bit archs is larger than on 32bit archs.

> On my x86 I get:
> betelgeuse@pena ~/test/java $ java Mem
> 489MB
> betelgeuse@pena ~/test/java $ java -server Mem
> 489MB

Please provide the output of 
free -m

Java has a history of not using swap and etc, and throwing OOM when there is still ram that can be freed from buffers and etc.
 
> But he got:
> 10:51 <@Betelgeuse> Lfe: Out of curiousity check what you get with that
> 10:53 < Lfe> 81MB :o
> 
> Of course I have twice the memory as him but still:
> 10:59 < Lfe> jrockit 1.5 gives me 1024 as max mem on a 4G system
> 11:00 < Lfe> 1024MB sry

Can I get the sources to that to test?

> I think in the long term we should just set the memory limit to all available
> system memory in the eclass.

If you feel that is necessary. But of all on the Java team I have run Java on amd64 the longest. Not to mention do most all Tomcat ebuild development on my amd64 laptop, and one of my clients Tomcat server is also amd64. I will be upgrading mine 

I have had many problems, but I am not sure that is the proper way to address. I messed around with increasing heap size and etc in Tomcat 6.0.x very early on when I had problems. But they were not reproducible always, that's why in most revisions that code is commented out. It wasn't needed or necessary.
Comment 4 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-18 17:33:57 UTC
Ok let's try some things real quick. Please modify your ebuild and add the following just below src_compile(){

ANT_OPTS=-XX:MaxPermSize=128m

Let me know if you still get heap space exceptions.

If you do, you can try forcing the compiler to ecj, that should work and should ignore the MaxPermSize.

After editing ebuild, make sure to digest, ebuild tomcat-6.0.13-r1.ebuild digest

Then merge and let us know the outcome.
Comment 5 Johan Bergström 2007-05-18 18:16:57 UTC
Forcing ecj works; setting ANT_OPTS did unfortunately not (i tried changing the value up to 512M)
Comment 6 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-19 02:32:31 UTC
Ok, we have determined setting the heap size via -Xmx96m or greater resolves the bug in the report. However I really must question the default heap size a bit. Based on the following,

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.%20Heap%20Size%7Coutline

"the initial heap size will be set to phys_mem / DefaultInitialRAMFraction. DefaultInitialRAMFraction is a command line option with a default value of 64"

"maximum heap size will be set to phys_mem / DefaultMaxRAM. DefaultMaxRAMFraction has a default value of 4"


So on a system with 1GB of ram that should have an default max heap of 256MB. Even though in this case setting -Xmx96m or -Xmx128m resolves, I fear that might be to low and cause this to appear more times than not. I was also able to replicate this inconsistently with openjdk 1.7. I might be comfortable going with -Xmx256m to resolve this bug. I observed it using ~150MB during compile.

But I really question why the heap size is so small initial in this case. Also why it's not being increased. Haven't read it exactly but seems that unless -Xmx is set, the vm can scale beyond the default max heap size.
Comment 7 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-19 19:20:42 UTC
Ok, what all jdks were tested? Which ones threw oom and did not?

I recall Sun's 1.5.0.11 doing it all the time. I believe did not. In my trials I did get an oom with openjdk 1.7, but that was after several emerges. Subsequent trials did not throw oom.

I might have had another instance of Java running when 1.7 threw the oom exception. But subsequent trials there I could not replicate either.

Seems the reporter was able to reproduce reliable with 1.5 all the time. I just got access to Gentoo's multi-core/multi-proc amd64 machine. I will setup a chroot and test as time permits.
Comment 8 Johan Bergström 2007-05-19 19:30:27 UTC
Sun's 1.5.0.* (recently only tested on 1.5.0.11) was the only one i was able to reproduce a oom on. sun 1.6 worked nicely - bea 1.5 and 1.6 and ibm 1.5 also had no issues.
Comment 9 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-20 19:46:11 UTC
Ok, I am seeing this as well with 1.5.0.11. However I can only replicate it reliable on Opteron's. As I can build Tomcat 6.0.13 all day long on my amd64 Turion without a problem. But for some reason when I do it on any Opterons, I am getting the out of heap space exception.

This has to be some sort of bug in 1.5.0.11. The systems have plenty of ram, and if setting it to -Xmx128m is all that is needed. Then that means the VM's default is less than that and not able to scale up. Which differs from 1.5.0.11 behavior on all other archs and procs.
Comment 10 Petteri Räty (RETIRED) gentoo-dev 2007-05-20 19:57:24 UTC
(In reply to comment #9)
> 
> This has to be some sort of bug in 1.5.0.11. The systems have plenty of ram,
> and if setting it to -Xmx128m is all that is needed. Then that means the VM's
> default is less than that and not able to scale up. Which differs from 1.5.0.11
> behavior on all other archs and procs.
> 

I suggest mailing the hotspot mailing list.
Comment 11 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-24 22:39:12 UTC
After further testing the garbage collector in sun-jdk-1.5.0.11 is broken or something on AMD64 Opteron processors. I have tested gc with serveral jdks including 1.5.0.11 on x86. Most all with very limited ram, ~64MB on x86, and 96MB on amd64. Default -Xmx's for Java on the given archs.

Only 1.5.0.11 on Operton processors has this problem. I know the -Xmx fixes it, but that's really just a band aid. Not to mention this problem is going to effect anything that makes heavy use of GC.

IMHO we should mask or do something with 1.5.0.x on amd64. No way to limit it just to Opterons. So no way to target the mask to just those procs.

I guess to resolve this bug I have no choice but to up -Xmx to 128 or etc. But that does not make the gc work any better. Just reduces the amount of gc. Which so far allows Tomcat to build.

But Tomcat itself can be build with very little ram, as done on x86.
Comment 12 Petteri Räty (RETIRED) gentoo-dev 2007-05-25 06:16:30 UTC
(In reply to comment #11)
>
> IMHO we should mask or do something with 1.5.0.x on amd64. No way to limit it
> just to Opterons. So no way to target the mask to just those procs.
>

blackdown-jdk as we all know is old as hell. sun-jdk-1.6* has open issues with Javadocs and it has an open security issue that 1.5* doesn't have so I don't think the amd64 arch team would want to mask 1.5* and I would doubt the users would like it that much. Report this to Sun and maybe they can get it fixed in some of the updates.I already know how to fix this problem on our side once and for all so just need to get the patch done.

Comment 13 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 14:14:36 UTC
(In reply to comment #12)
>>sun-jdk-1.6* has open issues with
> Javadocs and it has an open security issue that 1.5* doesn't

Is there a bug open on the security issues? Only one I can find seems to deal with Java web start which is not available on amd64.

> Report this to Sun and maybe they can get it fixed in
> some of the updates.

Yes, that would help but I have been short on time.

> I already know how to fix this problem on our side once and
> for all so just need to get the patch done.
 
You know of a band aid not a fix. Unless you know how to fix the GC. Increasing memory is a BAND AID, not a fix. A fix would allow the GC to perform as it does on all other archs. Which it does not, and again this will effect more than just emerges. It's just easy to replicate during some merges like Tomcat.

On a server or in a live env, I doubt it's as easy to replicate and likely quite troublesome when people run into it. That is when running Tomcat under a load, working with Eclipse, Netbeans, possibly Azureus etc.

Any ebuild that is increasing -Xmx on amd64 only, is doing so to address this bug. Which has been around for quite some time, and just now coming to surface.
Instead of investigating or reporting to Sun etc, band aids have been applied, and it's been ignored for quite some time.

If I use -Xmx to get Tomcat 6.0.x around this for now. I will in turn open a bug for sun-jdk-1.5.0.11 on amd64 due to the gc issues.

Also let me make it quite clear. Tomcat 6.0.x is in ~arch. sun-jdk 1.5.x is stable, and sun-jdk-1.6.x is in ~arch. So this is also a problem of people mixing stable and unstable trees :)
Comment 14 Petteri Räty (RETIRED) gentoo-dev 2007-05-25 14:21:57 UTC
(In reply to comment #13)
> 
> Is there a bug open on the security issues? Only one I can find seems to deal
> with Java web start which is not available on amd64.

bug 178575

> 
> On a server or in a live env, I doubt it's as easy to replicate and likely
> quite troublesome when people run into it. That is when running Tomcat under a
> load, working with Eclipse, Netbeans, possibly Azureus etc.
>

As we investigated this I think we came into the conclusion that emerges run out of memory because it rapidly needs to allocates lots of memory and has the memory limit set to something quite low to start with. Both Eclipse and Netbeans wrapper scripts set a higher memory limit I think and for Tomcat in server usage the allocation should be quite gradual. I do agree that runtime is a bit of a problem but that's why I think we should report this upstream instead of talking among ourselves. Until Sun fixes it the band aid is the best we can do.

Comment 15 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 14:54:02 UTC
(In reply to comment #14)
>
> As we investigated this I think we came into the conclusion that emerges run
> out of memory because it rapidly needs to allocates lots of memory and has the
> memory limit set to something quite low to start with.

That is 100% inaccurate. It has NOTHING at all to due with the initial memory allocation being to low. It's a problem of the GC. I will run tests again and add as attachments.

The test I ran you were not around for. Only Caster was in PM. Set

export ANT_OPTS="-verbose:gc"

And you will see what I am talking about. I did that with several JDK's on Opteron, Turion, and x86 procs.

> a bit of a problem but that's why I think we should report this upstream
> instead of talking among ourselves.

No disagreement there. Guess I will have to take this upstream. There is nothing stopping anyone else from reporting either.

> Until Sun fixes it the band aid is the best
> we can do.

Again what is the security issue with 1.6? 1.5 is not really current and if Sun fixes it will be back ports. They might just be inclined to say run 1.6 it's current.
Comment 16 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 15:36:23 UTC
Created attachment 120292 [details]
Tomcat 6.0.13 compile on amd64 Opteron with sun-jdk-1.5.11 verbose:gc
Comment 17 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 15:37:28 UTC
Created attachment 120293 [details]
Tomcat 6.0.13 compile on amd64 Turion with sun-jdk-1.5.11 verbose:gc
Comment 18 Petteri Räty (RETIRED) gentoo-dev 2007-05-25 15:37:45 UTC
(In reply to comment #15)
> 
> Again what is the security issue with 1.6? 1.5 is not really current and if Sun
> fixes it will be back ports. They might just be inclined to say run 1.6 it's
> current.
> 

Read the bug link from my last comment.
Comment 19 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 15:40:43 UTC
Created attachment 120294 [details]
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 verbose:gc
Comment 20 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 15:51:29 UTC
(In reply to comment #18)
> (In reply to comment #15)
>
> 
> Read the bug link from my last comment.

Sorry missed that. From reading it, and the links to details on the vulnerability. Is this the only thing that is effected in 1.6 by this?

"which makes its splashscreen support vulnerable to that issue."

What exactly is the exploit or vulnerability and how severe? Because a borked gc on opterons and looks like 64bit semprons is way worse IMHO. Than some possible exploit via libpng. I am not sure if that code is even executed in all env's which would diminish the chances of an exploit.

Comment 21 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 16:44:16 UTC
Filed bug with Sun, awaiting acceptance and bug #
Comment 22 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2007-05-25 16:54:47 UTC
The logs IMHO prove that the docs we saw about initial heap size of 1/4 memory and increase as needed are bogus, there's a clear limit set (64MB for x86, 82 for amd64). So increasing it to 128 or more is safe, it wouldn't grow more anywhere. It also shows that tomcat compile is just on the edge of this heap size. The fact that somewhere it fails and somewhere not is due to the random nature of memory allocation and gc and I wouldn't dare to call it bug. The correlation of failing with opteron/sempron might be just accidental and we could be able to find opterons and semprons where it wouldn't fail, and turions where it would fail, IMHO.

Oh and masking 1.5 for this is just insane idea :) 1.5 is still supported although not current, hell even 1.4 is still supported.
Comment 23 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 17:41:28 UTC
Created attachment 120302 [details]
Tomcat 6.0.13 compile on amd64 Opteron with sun-jdk-1.5.11 -Xmx128m verbose:gc
Comment 24 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 17:50:15 UTC
(In reply to comment #22)
> The logs IMHO prove that the docs we saw about initial heap size of 1/4 memory
> and increase as needed are bogus, there's a clear limit set (64MB for x86, 82
> for amd64).

There was no doc stating it would be increased as needed.

> So increasing it to 128 or more is safe, it wouldn't grow more
> anywhere. It also shows that tomcat compile is just on the edge of this heap
> size.

How do you explain it building with no problem on x86 with even LESS memory?
Same jdk?

Only reason 128 works is because it uses the GC less, view attachment.

 The fact that somewhere it fails and somewhere not is due to the random
> nature of memory allocation and gc and I wouldn't dare to call it bug.

Um no, the GC should be fairly consistent. If it fails on one it should fail on all. It does not, and only fails on some. Which says borked GC on some archs or processors. Much less not failing on archs with even less memory allocated.

 The
> correlation of failing with opteron/sempron might be just accidental and we
> could be able to find opterons and semprons where it wouldn't fail, and turions
> where it would fail, IMHO.

Fine, then find me an opteron that does not fail. Or a Turion that does. Because I can't and I have tried it on 3 opterons, and 1 turion. I can try on the pitr machine as well if you like. The Sempron was a surprise, but I would be willing to bet any 64bit amd64 proc would exhibit this behavior.
 
> Oh and masking 1.5 for this is just insane idea :) 1.5 is still supported
> although not current, hell even 1.4 is still supported.

A broken GC is pretty major. It might be the cause for a client of mines Tomcat to crash a while back. This was on a production system and happened during the day under a considerable load. It was when we weren't capturing stdout/stderr, so if the OOM heap space exception was thrown. Wasn't logged so I could not investigate.

So anytime 1.5.0.11 has a load on some amd64 procs, the gc will fail with oom. This is pretty major, and is apparent during the Tomcat build. It should appear much more often with other emerges as well.

But my main concern is not the OOM or heap space issues during emerge. It's during runtime. Since some of use are using our amd64 machines as production systems. Not development or etc.
Comment 25 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 17:52:46 UTC
(In reply to comment #24)
> The Sempron was a surprise, but I would
> be willing to bet any 64bit amd64 proc would exhibit this behavior.

Typo, meant to say any 64bit Sempron proc.
Comment 26 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 18:12:39 UTC
Created attachment 120303 [details]
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 -server verbose:gc
Comment 27 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 18:16:48 UTC
Created attachment 120307 [details]
Tomcat 6.0.13 compile on x86 with sun-jdk-1.5.11 verbose:gc
Comment 28 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2007-05-25 19:00:49 UTC
(In reply to comment #24)
> (In reply to comment #22)
> > The logs IMHO prove that the docs we saw about initial heap size of 1/4 memory
> > and increase as needed are bogus, there's a clear limit set (64MB for x86, 82
> > for amd64).
> 
> There was no doc stating it would be increased as needed.

Well I thought there was, but nevermind. Anyway 82Mb isn't 1/4 of memory. The point is increasing (and limiting) to 128 can't make it worse, as it would never try to use more than 82 otherwise.
 
> > So increasing it to 128 or more is safe, it wouldn't grow more
> > anywhere. It also shows that tomcat compile is just on the edge of this heap
> > size.
> 
> How do you explain it building with no problem on x86 with even LESS memory?
> Same jdk?

x86 has half the pointer size, thus needs less memory, you know this already. You brought up the docs where sun says amd64 increases limits by 30% just because of these pointer sizes... and guess how much is 64MB + 30% ?

> Only reason 128 works is because it uses the GC less, view attachment.

Well that's obvious, if you have more memory you don't need to GC so often. As it seems to call GC when the memory is up.

>  The fact that somewhere it fails and somewhere not is due to the random
> > nature of memory allocation and gc and I wouldn't dare to call it bug.
> 
> Um no, the GC should be fairly consistent. If it fails on one it should fail on
> all. It does not, and only fails on some. Which says borked GC on some archs or
> processors. Much less not failing on archs with even less memory allocated.

It can't be consistent. It's randomized and whatnot. Believe me. And again, comparing with x86 makes no sense.

>  The
> > correlation of failing with opteron/sempron might be just accidental and we
> > could be able to find opterons and semprons where it wouldn't fail, and turions
> > where it would fail, IMHO.
> 
> Fine, then find me an opteron that does not fail. Or a Turion that does.
> Because I can't and I have tried it on 3 opterons, and 1 turion. I can try on
> the pitr machine as well if you like. The Sempron was a surprise, but I would
> be willing to bet any 64bit amd64 proc would exhibit this behavior.

Well try some more machines if you want.

> > Oh and masking 1.5 for this is just insane idea :) 1.5 is still supported
> > although not current, hell even 1.4 is still supported.
> 
> A broken GC is pretty major. It might be the cause for a client of mines Tomcat
> to crash a while back. This was on a production system and happened during the
> day under a considerable load. It was when we weren't capturing stdout/stderr,
> so if the OOM heap space exception was thrown. Wasn't logged so I could not
> investigate.

The GC is not broken. I think the log shows what some doc said (try GC X times and if you don't get Y% of free memory, report out of heap). That's what I see in the opteron log:

[Full GC 83199K->82257K(83200K), 0.3545860 secs]
[Full GC 83199K->82516K(83200K), 0.4234800 secs]
[Full GC 83199K->82847K(83200K), 0.3568400 secs]
[Full GC 83199K->82977K(83200K), 0.3571690 secs]
[Full GC 83199K->83071K(83200K), 0.3575310 secs]
[Full GC 83199K->83118K(83200K), 0.3578090 secs]
[Full GC 83199K->83141K(83200K), 0.3576950 secs]
[Full GC 83199K->83162K(83200K), 0.3563310 secs]
[Full GC 83199K->83176K(83200K), 0.3586920 secs]
[Full GC 83199K->83183K(83200K), 0.4171250 secs]
[Full GC 83199K->83192K(83200K), 0.3580760 secs]
[Full GC 83199K->83193K(83200K), 0.3271350 secs]
[Full GC 83199K->83194K(83200K), 0.3278450 secs]
[Full GC 83199K->83197K(83200K), 0.3277420 secs]
[Full GC 83199K->83198K(83200K), 0.3269420 secs]
[Full GC 83199K->83198K(83200K), 0.3263100 secs]
[Full GC 83198K->83197K(83200K), 0.3263670 secs]
[Full GC 83199K->83197K(83200K), 0.3262520 secs]
[Full GC 83199K->83198K(83200K), 0.3263310 secs]
[Full GC 83199K->83198K(83200K), 0.3265520 secs]
[Full GC 83199K->83199K(83200K), 0.3243410 secs]
[Full GC 83199K->83199K(83200K), 0.3261400 secs]
[Full GC 83199K->83198K(83200K), 0.3262390 secs]
[Full GC 83199K->83199K(83200K), 0.3371210 secs]
[Full GC 83199K->83199K(83200K), 0.3250670 secs]
[Full GC 83199K->83187K(83200K), 0.4179880 secs]
[Full GC 83199K->83173K(83200K), 0.3563100 secs]
[Full GC 83199K->83149K(83200K), 0.3618330 secs]
[Full GC 83199K->83144K(83200K), 0.3587660 secs]

Even the Turion log (which you performed Using: sun-jdk-1.7 btw) is pretty close at one point:

[Full GC 83199K->82092K(83200K), 0.4843630 secs]

Here GC freed only 1MB below the limit. Possibly if the limit was 1MB lower, it wouldn't be successful too. Would you then say that GC is broken in 1.7 on Turion?

> So anytime 1.5.0.11 has a load on some amd64 procs, the gc will fail with oom.

That's hardly correct conclusion from all this.

> This is pretty major, and is apparent during the Tomcat build. It should appear
> much more often with other emerges as well.
> 
> But my main concern is not the OOM or heap space issues during emerge. It's
> during runtime. Since some of use are using our amd64 machines as production
> systems. Not development or etc.

Well obviously there are tasks (tomcat compile, eclipse startup, hungry web-apps) where the default heap size limit is not enough and you need to set more, that's what conf.d files are for. Blaming that on broken GC is wrong.
Comment 29 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 19:47:31 UTC
(In reply to comment #28)
> 
> Well I thought there was, but nevermind. Anyway 82Mb isn't 1/4 of memory.

Sorta on the 1/4 here is that doc
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.%20Heap%20Size%7Coutline

It's based on DefaultInitialRAMFraction so that value must be different than stated or something.

> The
> point is increasing (and limiting) to 128 can't make it worse, as it would
> never try to use more than 82 otherwise.

I am not contesting it would make the problem worse at this point. Just why it's needed in the first place.

> Even the Turion log (which you performed Using: sun-jdk-1.7 btw) is pretty
> close at one point:

Odd, I have a default jdk.conf and 1.5 is set as system vm. Ebuild is >= 1.5, so java stuff must be switching to 1.7 during build. Happened on earlier x86 attachment. Although I thought I double checked.

> [Full GC 83199K->82092K(83200K), 0.4843630 secs]
> 
> Here GC freed only 1MB below the limit. Possibly if the limit was 1MB lower, it
> wouldn't be successful too. Would you then say that GC is broken in 1.7 on
> Turion?

I was able to get it to run out of heap space once, that I have not been able to replicate since. Now that I set to 1.5 specifically I am able to reproduce now consistently on my Turion laptop. Starting to question if the jdk in past tests on Turion was really 1.5. Starting to doubt it.

> > So anytime 1.5.0.11 has a load on some amd64 procs, the gc will fail with oom.
> 
> That's hardly correct conclusion from all this.

With further testing I agree. Thus now need to go retract upstream bug, that they will likely shoot down either way since it's likely invalid.

> Well obviously there are tasks (tomcat compile, eclipse startup, hungry
> web-apps) where the default heap size limit is not enough and you need to set
> more, that's what conf.d files are for. Blaming that on broken GC is wrong.

That was more of a rushed assumption because I was not able to replicate, and lacked time to really focus on this. I tried to express I am under a considerable work load. Not to mention was trying to close as many Tomcat bugs as fast as possible. Oh well sorry for it all.
 

Comment 30 Petteri Räty (RETIRED) gentoo-dev 2007-05-25 19:55:17 UTC
(In reply to comment #29)
> 
> That was more of a rushed assumption because I was not able to replicate, and
> lacked time to really focus on this. I tried to express I am under a
> considerable work load. Not to mention was trying to close as many Tomcat bugs
> as fast as possible. Oh well sorry for it all.
> 

Remember it's perfectly fine to take a week or two off from Gentoo work if you don't have the time for it. Good that you finally see it our way :)
Comment 31 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 21:00:09 UTC
Created attachment 120332 [details]
Tomcat 6.0.13 compile on amd64 with sun-jdk-1.5.11 vm ecj compiler verbose:gc
Comment 32 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-25 21:03:16 UTC
(In reply to comment #30)
> 
> Remember it's perfectly fine to take a week or two off from Gentoo work if you
> don't have the time for it.

Fully aware, ty for the reminder.

> Good that you finally see it our way :)
 
Well I am not saying 100% if I will go with -Xmx128m as a solution. If I do I will likely see about detecting arch amd64, and jdk version sun-jdk-1.5 and if both conditions exist, then set -Xmx128m.

Or I might force ecj like I did when I first ran into the problem with Tomcat 6.0.x quite some time back. Considering Tomcat deps on ecj, and uses it by default as it's internal compiler. Might be a better way go to, but likely will still do a amd64/sun-jdk-1.5 check before forcing ecj.
Comment 33 William L. Thomson Jr. (RETIRED) gentoo-dev 2007-05-28 23:34:09 UTC
Added arch and compiler/vm detection, which will force ecj if arch is amd64 and compiler/vm is  sun-jdk-1.5. Kinda preferred the less memory route than more. Not sure if that's a greener solution or not ;)

Can modify if needed if this comes up again. Closing bug for now. Sorry again for all the madness.