Bug 87305 - Neither sci-physics/root-4.02.00 nor sci-physics/root-5.14.00b builds cleanly on sparc
|
Bug#:
87305
|
Product: Gentoo Linux
|
Version: 2006.0
|
Platform: All
|
|
OS/Version: All
|
Status: RESOLVED
|
Severity: normal
|
Priority: P2
|
|
Resolution: FIXED
|
Assigned To: sci-physics@gentoo.org
|
Reported By: fmccor@gentoo.org
|
|
Component: Applications
|
|
|
URL:
|
|
Summary: Neither sci-physics/root-4.02.00 nor sci-physics/root-5.14.00b builds cleanly on sparc
|
|
Keywords:
|
|
Status Whiteboard:
|
|
Opened: 2005-03-30 10:03 0000
|
You will have noticed that I just marked sci-libs/root-4.02.00 '-sparc' because
as it stands,
it cannot build on sparc. There are two problems (everything relative to
${PORTAGE_TMPDIR}/portage.root-4.02.00/work/root):
(1) unix/src/TUnixSystem.cxx does not know about sparc, and so the compile
fails;
(2) xrootd/src/xrootd/... does not know about sparc, and thus its
configure-on-the-fly fails.
The first of these is trivial to fix. The second is not hard, but it is ugly
because as part of the unpack
process, you must untar the xrootd/src/xrootd directory, patch it, and re-tar
it so that the
build will create it correctly a little later.
Patch will be provided, but that takes a second entry for the bug because
initial entry doesn't seem
to allow attachments. With the patch incorporated, root can have ~sparc
because it runs well enough for testing.
Obligatory environment information:
U2(2x400 SMP) + U60(2x450 SMP), kernel 2.4.27-sparc-r1 (gcc version
3.3.5-20050130 (Gentoo Hardened Linux 3.3.5.20050130-r1, ssp-3.3.5.20050130-1,
pie-8.7.7.1)
two systems, using distcc).
Created an attachment (id=54860) [details]
When applied in .../portage/sci-libs/root, converts root-4.02.00 into a
sparc-friendly ebuild
Apply the patch thus: In .../portage/sci-libs/root, execute
patch -p0 -b -z- < {path-to-patch}/root-4.02.00.patch
Resulting ebuild:
(1) Changes -sparc ==> ~sparc
(2) Enables USE=ruby (configure can't find /usr/lib/libruby18-static.a, but
that's for another bug report.)
(3) patches unix/src/TUnixSystem.cxx to recognize sparc.
(4) un-tars xrootd/src/xrootd, patches three files to allow sparc and to use
the OPT optimization flag;
(5) re-tars xrootd/src/xrootd-20041124-0752.src.tgz so that during the build,
root's configure-on-the-fly
for xrootd will know about sparc.
With these changes, root seems to do OK with the tutorials on sparc, and so
could earn a ~sparc keyword.
Could you please submit a bug report upstream, I've had some success getting
the idiotic ARCH file in xrootd modified to support amd64/ppc, though it would
be far nicer if the problem was never allowed to happen.
how are we on this old bug? Could test on sparc the new root-5 now in the main
tree?
Still needs love:
ayanami ~ # emerge root
Calculating dependencies... done!
>>> Emerging (1 of 1) sci-physics/root-5.14.00b to /
* root_v5.14.00b.source.tar.gz MD5 ;-) ...
[ ok ]
* root_v5.14.00b.source.tar.gz RMD160 ;-) ...
[ ok ]
* root_v5.14.00b.source.tar.gz SHA1 ;-) ...
[ ok ]
* root_v5.14.00b.source.tar.gz SHA256 ;-) ...
[ ok ]
* root_v5.14.00b.source.tar.gz size ;-) ...
[ ok ]
* Users_Guide_5_14.pdf MD5 ;-) ...
[ ok ]
* Users_Guide_5_14.pdf RMD160 ;-) ...
[ ok ]
* Users_Guide_5_14.pdf SHA1 ;-) ...
[ ok ]
* Users_Guide_5_14.pdf SHA256 ;-) ...
[ ok ]
* Users_Guide_5_14.pdf size ;-) ...
[ ok ]
* checking ebuild checksums ;-) ...
[ ok ]
* checking auxfile checksums ;-) ...
[ ok ]
* checking miscfile checksums ;-) ...
[ ok ]
* checking root_v5.14.00b.source.tar.gz ;-) ...
[ ok ]
* checking Users_Guide_5_14.pdf ;-) ...
[ ok ]
*
* You may want to build ROOT with these non Gentoo extra packages:
* AliEn, castor, Chirp, Globus, Monalisa, Oracle, peac,
* PYTHIA, PYTHIA6, SapDB, SRP, Venus
* You can use the EXTRA_CONF variable for this.
* Example, for PYTHIA, you would do:
* EXTRA_CONF="--enable-pythia --with-pythia-libdir=/usr/lib" emerge root
*
>>> Unpacking source...
>>> Unpacking root_v5.14.00b.source.tar.gz to /var/tmp/portage/root-5.14.00b/work
>>> Unpacking Users_Guide_5_14.pdf to /var/tmp/portage/root-5.14.00b/work
unpack Users_Guide_5_14.pdf: file format not recognized. Ignoring.
>>> Source unpacked.
>>> Compiling source in /var/tmp/portage/root-5.14.00b/work/root ...
Attempts at guessing your architecture failed.
Please specify the architecture as the first argument.
Do './configure --help' for a list of avaliable architectures.
!!! ERROR: sci-physics/root-5.14.00b failed.
Call stack:
ebuild.sh, line 1546: Called dyn_compile
ebuild.sh, line 937: Called src_compile
root-5.14.00b.ebuild, line 115: Called die
!!! configure failed
!!! If you need support, post the topmost build error, and the call stack if
relevant.
I gave a quick look at how to change it for sparc, although I don't have a
sparc box. It seems that xrootd was modified and now supports linux sparc. But
the main configure has a lot of arches/compiler flags. Could you try to see if
the generic gcc work? You can pass it with EXTRA_CONF="linux" emerge root.
(In reply to comment #5)
> I gave a quick look at how to change it for sparc, although I don't have a
> sparc box. It seems that xrootd was modified and now supports linux sparc. But
> the main configure has a lot of arches/compiler flags. Could you try to see if
> the generic gcc work? You can pass it with EXTRA_CONF="linux" emerge root.
>
At first glance, that does not seem sufficient. I'll keep playing with it,
though.
I'm attempting a build now. With this configure file, the architecture must
come first, so to get it to configure, in the ebuild I had to make the
configure statement start:
./configure \
${EXTRA_CONF} \
etc.
Real problem is in the configure file, though. It is still missing
a linux:sparc:*.*)
entry for autodetection.
Also, I don't know yet if arch=linux is correct for sparc, so I'm still
playing.
It looks as if root-5 should build on sparc. Most of it does with arch=linux.
There is one problem at least, and I'll try to look at it more closely:
It wants to build netx, netx appears to require xrootd, but it does not want to
build xrootd. So, netx fails. I'm going to try explicit --disable-xrootd
(which should disable netx) and explicit --enable-xrootd to see what happens.
If I force --disable-xrootd in the ebuild, root-5.14.00b now appears to build
OK on sparc. (That's with EXTRA_CONF="linux" emerge root and with the ebuild
change mentioned in Comment #7.) I'll play with it some more on a more current
system.
(In reply to comment #9)
> If I force --disable-xrootd in the ebuild, root-5.14.00b now appears to build
> OK on sparc. (That's with EXTRA_CONF="linux" emerge root and with the ebuild
> change mentioned in Comment #7.) I'll play with it some more on a more current
> system.
>
I tried this on SB1000-MP system. It appears that on sparc, for some reason
--enable-xrootd does not actually cause root to build xrootd. This might be a
problem in the configure file, I'll investigate a little more, but not much
unless it's something obvious.
Found it, I think. Not only does the configure file not like sparc linux,
neither does root/xrootd/src/xrootd/config/ARCHS. Same fixes as in Attachment
54860 [details] (well, the only attachment on this bug) are required for root-5 configure
& ARCHS file. Unfortunately, xrootd is distributed in the root source as a
.tgz file, so applying the fix here seems tricky.
If you know anyone upstream, I suppose you could copy that person on this bug
and let upstream do what they like.
Bringing the summary in line with what we are talking about here. The much
more relevant case --- root-5 --- begins at Comment 3. It would be great if
upstream would explain how the would add an architecture like this:
linux:sparc*:*) arch=linuxsparcgcc ;;
and get it to build everything. I know we have to add linuxsparcgcc to
config/ARCHS and provide a config/Makefile.linuxsparcgcc, and at that point we
have a version which can autodetect a sparc-linux system. I don't know if that
also will cause xrootd to start building, but I can't see anything at all that
would lead me to believe so: I can't see anything that tells me that just
using arch=linux shouldn't build xrootd. It looks to me that once you un-tar
the xrootd tree, xrootd/Module.mk is going to go ahead and build it or die
trying.
I have a fix for the automatic system determination and failure; it's a small
change to the ebuild and one-line patches to three files. However, some
curious behavior has come up.
On amd64, with MAKEOPTS='j2' or MAKEOPTS='-j1' this package seems to build
fine. With -j3, it seems to always fail with 'cannot find -lCore'. And
indeed, at -j3 does not get built.
On sparc SB1000(1x750,1x900 multiprocessor), at -j3 -lCore always gets built,
but -lCintex never gets built in time: I think I can say that for
-j<anything>. (The xrootd libries do build fine, though. :)) This could be a
problem with how I put together Makefile.linuxsparcgcc, but the only difference
between it and Makefile.linux is: "+OPTFLAGS = -O2 -mcpu=ultrasparc"
(instead of '-O'), I don't know what that definition is used for: Every time I
look, the build is using mine, anyway.
Net result: (1) I have a possible fix (very small) to the original problem;
(2) some sparc systems consistently cannot build -lCintex; (3) amd64 sometimes
cannot build -lCore, but that is not a consistent failure. I'll probably
attach the patch which makes this problem seem to go away in a bit (it looks
like a new ebuild, but it's just the original with a little src_unpack()
provided: We need to patch 6 of root's files (well, 5 and add a new one).)
Something occurs to me. As my previous comment shows, I added a new
arch=linuxsparcgcc, but linux would do as well for that. I only need to know
what kind of system I am on in xrootd/Module.mk I should really check to see
who cares about it. That's an easy test (`uname -m` works as well for me for
testing. Not a general fix because (apparently) of windows, but on my systems
I don't care at this point.
By the way, what I have done does not affect amd64; the ebuild change won't
apply it to anything at all. amd64 fails nicely on it's own. :)
Created an attachment (id=109728) [details]
Fix for the "unknown architecture" bug reported by gustavoz in Comment 4
This one-line patch adds sparc to the configure script so that it can continue:
More concretely, it adds the definition:
linux:sparc*:*) arch=linux ;;
at the appropriate spot to root/configure. This is a complete fix.
A better fix would be 'arch=linuxsparcgcc', but that requires a corresponding
Makefile.linuxsparcgcc, and I am not quite sure what would be best (it's easy
to make one. Take, say, Makefile.linux and play with it. Since arch=linux
works just fine, I'd prefer to wait for feedback, if any, before tailoring a
sparc-specific one.
This patch should be considered mandatory, and all it takes is:
(1) put that little file in the root/files directory;
(2) Create a src_unpack function, thus:
src_unpack() {
unpack ${A}
cd ${WORKDIR}
epatch ${FILESDIR}/<wlatever-a-good-name-for-this-patch-is>
}
Created an attachment (id=109743) [details]
An example patch file and ebuild replacement showing the requirements for a
complete fix for the initial complaint.
This file contains an example ebuild replacement and a patch file containing
three one-line patches. This is a complete fix.
However, I do not recommending using it, and if you read the ebuild, you will
see why: Essentially, it contains the Attachment 109728 [details] fix (which is fine),
then it does a manual unpack (i.e., tar) on a second source file bundled within
the master source (it is the source for xrootd), applies a couple one-line
patches, then repackages it (i.e., tar) so that the root build process can use
it. I am not sure, but I suspect that, say, ciaranm would recommend it for a
question on a new-developer quiz for "How many things are wrong in this
ebuild?". What it does show, however, is how the bug must be fixed, by
whatever acceptable means. Such as ---
(1) Provide the xrootd part of the patch as a second source file and let
src_unpack put it into the root/xrootd directly;
(2) Patch the root/xrootd/Module.mk file apply the patch whenever it is asked
to unpack the file. That would be at line 75 right after 'touch $$etag ;\' you
would have
patch <appropriate-options-to-apply-the-patch> ; \
For verifying the patch I didn't need that, and what I am providing makes
brutally clear what is going on here. Plus it took about 5 minutes. :)
Let's review the bidding:
(1) I don't have anything more to say on it, except that I am willing to work
on a more proper fix than the one in Attachment 109743 [details]. I think Comment 16
explains a good solution (let the Module.mk fix its problems itself).
(2) The -lCore problem is a problem with parallel makes for root on systems
which are already and you provide a MAKEOPTS='-jxxx' a value which can
resulting in the make process to use more CPUs than you have. That is what I
was doing on the amd64, and I inadvertently reproduced it on sparc at home. I
was building with -j4 on two systems using distcc. Now, some operations can't
be distrubed, so on occasion I'd have a -j4 build on a 2-processor system.
That's OK, but it turns out that this ended up running at the same time as the
daily cron runs (makewhatis, slocate, update-eix --- you get the picture). In
fact, I just saw it here on a sparc because the system was already building
things.
Why -lCore? Because it's huge, I am sure.
(3) I don't know what the -lCintex problem is. All my systems here see it, but
I can't reproduce it at home. E.g., I saw it here on a U60 distributing to a
U2. I ran the same test at home (build on a U60 distribute to U2) and that
worked fine a couple times. On the third time I saw the -lCore problem, but I
am sure that is a timing thing as described in point (2).
I will point out, however, that my systems here (U60+U20, SB1000) are all
completely current as to installed software (or perhaps more so: I run a lot of
~sparc stuff). The ones at home are not current (there, when portage wants to
update things, I pick and choose.) I strongly suspect python (especially since
Cintex seems to be building a lot of python stuff) but I have no evidence. The
answer is probably in the build log, and I might chase through it sometime
(it's huge). More likely, I'll try to figure out how to ask the Makefile to
build one specific module, do that for Cintex, and see if I can see any
indication that it failed to build the library, but apparently notice it.
Suggestions from people who actually know something about building root are
welcome. :)
I'll provide a bit more information. These failures in points (2) and (3)
above are the same. When it complains about a missing library, it looks as if
that library is being built, and the make process is trying to use it before
the file is not created.
Why do I think this? Because I went into the build directory from a failure,
and just entered 'make'. The first thing it did was finish building the
libraries. Then I could finesse an install using ebuild. At the same time, my
U2 distributing to the U60 built successfully. So the only dependable failure
is this system: SB1000(1x900,1x750) asymmetric multiprocessor. I just think
somehow the fact that the processors are running at different speeds is messing
with the long sequence of linking at the end. I have no clue how to attack
such a problem. (-j1 sounds good, but doesn't work because, I suppose, the
operating system can choose to reassign CPUs.)
Oh, yes, it seems to work just like it is supposed to.
Created an attachment (id=109883) [details]
Replacement patch and ebuild
This version replaces an old-style test 'if use sparc; then' with a newer style
'if [[ ${ARCH} == sparc ]]; then' and so should work with all package manager
candidates (pkgcore dislikes the former, I haven't had a chance to try with
paludis).
I've rethought my position a bit. Ugly as it is, src_unpack really does have
to do just that and should not defer the required patching to a Module.mk file.
Anyone reading this, if you have time, please do try the build. It should run
to completion and work. If it does not, you are seeing the race condition
described in Comment 14 (attempt to use a library before its build is
complete). We need to know if this is a Makefile problem or a system problem,
and if it is limited to 2 out of 6 systems for me (with random failures on the
others if the stars are right). Me, I can cause it on both sparc and amd64 any
time I wish, and on my (asymmetric) SB1000MP, it is a hard failure.
If the attached patch (or equivalent) is applied, I am prepared to give this a
~sparc, but will not even consider a 'sparc' until I know the answer to the
above question. And, of course, if the Makefile is implicated, until it is
fixed. (However, upstream at CERN really should "bite the bullet" and add this
three-line patch themselves. Why they haven't is beyond me.)
(In reply to comment #19)
>
> Anyone reading this, if you have time, please do try the build. It should run
> to completion and work. If it does not, you are seeing the race condition
> described in Comment 14 (attempt to use a library before its build is
> complete).
That's Comment 18, of course. And if you do see the failure, please check the
log. It looks to me that not only hasn't the build for the missing library
completed, but also it hasn't started yet. That is, its attempted use sort of
sneaks in before the build (which should have been the next step).
This package cannot build reliably with a parallel make, and so unfortunately
it looks as if the 'emake \' needs to be 'emake -j1 \'. With
MAKEOPTS='-j1' this package builds just fine for me now, but on some systems
(e.g., SB1000) it will never build in parallel.
Created an attachment (id=110992) [details]
Parallel make friendly patch and replacement ebuild
Parallel make seems to fail (if it fails at all) when it attempts to use -lCore
before -lCore gets built. Replacement ebuild uses the rather ugly construct
emake ... || emake -j1 ... || die ...
As best as I can tell, this should always allow a mostly parallel make (highly
desirable on my slower systems) but recover (if it needs to) and complete
successfully. Experience from others highly desirable.
Fix spelling error in summary (sci-lphysics --> sci-physics).
> before -lCore gets built. Replacement ebuild uses the rather ugly construct
> emake ... || emake -j1 ... || die ...
I don't see any of this in the patch attached. Did you attach the right one?
I have no problem with -j2 on my single-cpu amd64.
Let me know if applying something like the following works for you:
emake root
emake -j1 lib/libCore.so",
before the first emake command in the original ebuild?
Created an attachment (id=111019) [details]
Corrected parallel make ebuild
No, I did not attach the correct file. This one (I am sure) is. I will verify
later, but I am pretty sure your suggestion will not work because libCore.so is
needed to build many of root's pieces.
And parallel make does work for me in some instances. For example, it works
fine (usually) on amd64(SMP) with MAKEOPTS='-j2', but never with
MAKEOPTS='-j3'. On SB1000(Asymmetric MP), parallel make never works for me.
And so on.
Sorry, I misread your suggestion and when describing the new attachment, of
course I could not go back to look at it. I'll play with what you are
suggesting later.
Actually, I mistyped as well. Try this bit instead:
emake OPTFLAGS="${CXXFLAGS}" rootcint compiledata
emake OPTFLAGS="${CXXFLAGS}" -j1 lib/libCore.so
emake OPTFLAGS="${CXXFLAGS}"
I will try -j3 too.
(In reply to comment #28)
> Actually, I mistyped as well. Try this bit instead:
>
> emake OPTFLAGS="${CXXFLAGS}" rootcint compiledata
> emake OPTFLAGS="${CXXFLAGS}" -j1 lib/libCore.so
> emake OPTFLAGS="${CXXFLAGS}"
>
With these, build fails on sparc and on amd64, thus:
cint/main/cint_tmp -K -w1 -zipc -ncint/lib/G__c_ipc.c -D__MAKECINT__
-DG__MAKECINT \
-c-2 -Z0 cint/lib/ipc/ipcif.h
Error: Symbol __BEGIN_DECLS#include is not defined in current scope
/usr/include/sys/types.h:35:
Error: Symbol bits is not defined in current scope
/usr/include/sys/types.h:35:
Error: Symbol types is not defined in current scope
/usr/include/sys/types.h:35:
Error: Failed to evaluate types.h
Error: operator '/' divided by zero /usr/include/sys/types.h:35:
Error: Symbol #ifdef__USE_BSD#ifndef__u_char_definedtypedef__u_charu_char is
not defined in current scope /usr/include/sys/types.h:35:
on sparc
Or cint/main/cint_tmp -K -w1 -zipc -ncint/lib/G__c_ipc.c -D__MAKECINT__
-DG__MAKECINT \
-c-2 -Z0 cint/lib/ipc/ipcif.h
Warning: Unknown type key_t in function argument cint/lib/ipc/ipcif.h:140:
Error: Symbol ushortsem_num is not defined in current scope
cint/lib/ipc/ipcif.h:172:
!!!Removing cint/lib/G__c_ipc.c cint/lib/G__c_ipc.h !!!
make: *** [cint/lib/G__c_ipc.c] Error 1
make: *** Waiting for unfinished jobs....
on amd64
With the original ebuild, this looks like (on both amd64 and sparc):
cint/main/cint_tmp -K -w1 -zipc -ncint/lib/G__c_ipc.c -D__MAKECINT__
-DG__MAKECINT \
-c-2 -Z0 cint/lib/ipc/ipcif.h
Note: Link requested for undefined class ipc_parm (ignore this message) :0:
Note: Link requested for undefined class ipc_perm (ignore this message) :0:
Note: Link requested for undefined class semid_ds (ignore this message) :0:
Note: Link requested for undefined class msqid_ds (ignore this message) :0:
So I don't think that alternative is an option.
The emake OPT... || emake -j1 OPT... || die
construction still seems to work, though. Problem is that on all my systems, if
MAKEOPTS specifies too much parallelism, the "emake OPT..." fails.
> I will try -j3 too.
>
No luck with -j3 as well on my box.
The only target list that successfully worked is the following:
emake OPTFLAGS="${CXXFLAGS}" rootcint compiledata
emake OPTFLAGS="${CXXFLAGS}" -j1 rootlibs
emake OPTFLAGS="${CXXFLAGS}"
I put a slightly updated root with the sparc patch on gentooscience overlay.
you can try it with layman -S science. Let me know if it works.
(In reply to comment #30)
> No luck with -j3 as well on my box.
>
> The only target list that successfully worked is the following:
>
> emake OPTFLAGS="${CXXFLAGS}" rootcint compiledata
> emake OPTFLAGS="${CXXFLAGS}" -j1 rootlibs
> emake OPTFLAGS="${CXXFLAGS}"
>
> I put a slightly updated root with the sparc patch on gentooscience overlay.
> you can try it with layman -S science. Let me know if it works.
>
Testing now, but two observations:
1) The patch file needs to be called sparc-root-5.14.00c.patch
2) The './configure "${EXTRA_CONF}" \' is wrong. This causes it to try to
configure for architecture = "", which fails. Part of the patch adds
sparc/linux to the known architectures, and it is autodetected correctly (as
linux). So, the ${EXTRA_CONF} should stay at the end of the ./configure call
as in the original. (I.e., the ./configure part of the attachment is correct.)
With the changes to mentioned in the "two observations" of Comment #31,
root-5.14.00c seems good. I have a couple more systems to check it on, and
will have a complete answer sometime the 26th.
By the way, you would never want
./configure "${EXTRA_CONF}" \
Suppose we have
export EXTRA_CONF="linux --enable-pythia --with-pythia-libdir=/usr/lib"
to force build for linux, and to enable pythia (as in the setup example).
Then, with ./configure "${EXTRA_CONF}", the configure script would assign
ARCH="linux --enable-pythia --with-pithia-libdir=/usr/lib"
and the build would instantly fail.
I have verified that if you leave it
./configure ${EXTRA_CONF} \
without the quotes, you probably get what what we want (at least, configure no
longer complains when EXTRA_CONF is not set to anything).
Locally, tests good with both pkgcore and portage if you fix the name of the
patch file to sparc-root-5.14.00c.patch and make it
./configure ${EXTRA_CONF} \
without the quote marks. I need to verify later on Monday on SB1000, but I am
pretty sure that this version (with corrections) can have it's ~sparc. :)
Thanks for the support.
Hi
I commited the root-5.14.00c changes suggested here. Was the sparc test
successful? We should send our fixes upstream. Has anyone sent anything yet?
Sebastien
(In reply to comment #34)
> Hi
>
> I commited the root-5.14.00c changes suggested here. Was the sparc test
> successful? We should send our fixes upstream. Has anyone sent anything yet?
>
> Sebastien
>
Sparc seems fine with .00c. Thanks.
(In reply to comment #35)
> (In reply to comment #34)
> > Hi
> >
> > I commited the root-5.14.00c changes suggested here. Was the sparc test
> > successful? We should send our fixes upstream. Has anyone sent anything yet?
> >
> > Sebastien
> >
> Sparc seems fine with .00c. Thanks.
>
Oh, missed a question. I haven't sent anything upstream; I don't know to whom
to send it. That little patch file should be all they need. And so far as I
am concerned, this bug is fixed. But I'll leave it to you to make that
determination (e.g., does upstream need to apply the three line patch before we
can close this?).
Sent upstream both parallel building problem and sparc patch.
Closing this bug for now.
Thanks.