An ebuild for some genome sequence analysis tools provided at http://biocomp.dfci.harvard.edu/tgi/software/ (Harvard University Computational Biology and Functional Genomics Laboratory, The Gene Index Project)
Created attachment 103797 [details] sci-biology/tgi-tools-0.1.ebuild
The latest version can be found at http://www.gentoo-sunrise.org/sunrise/browser/reviewed/sci-biology/tgi-tools
I already watch this bug through the sci-biology alias.
Looking at this one. The ebuild is looking good. This construct may raise some issues: for i in seqclean/{seqclean,mdust,trimpoly} cdbfasta/cdbfasta tgicl/{psx,pvmsx,zmsort,tclust,sclust,nrcl, SRC_URI="${SRC_URI} ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/${i}.tar.gz" done Normally running functions or sed/awk invocations in global scope (as it will be run multiple times of=ver the course of ebuild processing) or mangling SRC_URI in nontrivial way is discouraged. However this is a basic cycle over a fixed set producing a static value in SRC_URI, so it should be Ok. Just as a note: static nature of SRC_URI is also essential, since dynamic SRC_URI will screw server-side cache (this is a reason why we cannot put use flags in SRC_URI for example). Checked the build - went fine, so you can add ~amd64 to the KEYWORDS. Looks Ok in general, just a few remarks on install. 1. How typical of skeletal sci package - only few binaries and scarce and short README files, good that even they exist :). 2. find -iname readme -exec ls -l \{} \; in ${WORKDIR} shows 11 README files, but only two are installed. However all the README's except these two contain only some short blurb, not much usefull information. If not for the copyright information in them I'd say it is indeed not necessary to include them. However since they contain what amounts to copyright claims, it might be better to actually install them. Are the copyright claims identical in all of them and the same as in LICENSE? (which there are also 11, but identical ones) If yes, then it is safe to omit those README's, otherwise it is better to install all, even short ones. George
George, Yes, the copyright claims are identical in all of them, as far as I know. This package is not very useful anyway; it probably isn't needed in mainline.
I think this is a nice package if it will really allow me to cluster in parallel EST sequences.
>>> Emerging (1 of 6) sci-biology/tgi-tools-0.1 * pvmsx.tar.gz RMD160 SHA1 SHA256 size ;-) ... [ ok ] * mdust.tar.gz RMD160 SHA1 SHA256 size ;-) ... [ ok ] Refetching... File renamed to '/usr/portage/distfiles/tgi_cpp_library.tar.gz._checksum_failure_.dhhvcl' >>> Downloading 'http://gentoo.mirror.web4u.cz/distfiles/tgi_cpp_library.tar.gz' --2008-12-19 14:00:24-- http://gentoo.mirror.web4u.cz/distfiles/tgi_cpp_library.tar.gz Resolving gentoo.mirror.web4u.cz... 81.91.81.13 Connecting to gentoo.mirror.web4u.cz|81.91.81.13|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2008-12-19 14:00:24 ERROR 404: Not Found. >>> Downloading 'ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/tgicl/tgi_cpp_library.tar.gz' --2008-12-19 14:00:24-- ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/tgicl/tgi_cpp_library.tar.gz => `/usr/portage/distfiles/tgi_cpp_library.tar.gz' Resolving occams.dfci.harvard.edu... 155.52.47.33 Connecting to occams.dfci.harvard.edu|155.52.47.33|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /pub/bio/tgi/software/tgicl ... done. ==> SIZE tgi_cpp_library.tar.gz ... 59669 ==> PASV ... done. ==> RETR tgi_cpp_library.tar.gz ... done. Length: 59669 (58K) 100%[===========================================================================================================================================================================================>] 59,669 112K/s in 0.5s 2008-12-19 14:00:27 (112 KB/s) - `/usr/portage/distfiles/tgi_cpp_library.tar.gz' saved [59669] ('Filesize does not match recorded size', 59669L, 58212) !!! Fetched file: tgi_cpp_library.tar.gz VERIFY FAILED! !!! Reason: Filesize does not match recorded size !!! Got: 59669 !!! Expected: 58212 Refetching... File renamed to '/usr/portage/distfiles/tgi_cpp_library.tar.gz._checksum_failure_.dhhvcl' !!! Couldn't download 'tgi_cpp_library.tar.gz'. Aborting. * Fetch failed for 'sci-biology/tgi-tools-0.1', Log file: * '/var/tmp/portage/sci-biology/tgi-tools-0.1/temp/build.log' yafc occams.dfci.harvard.edu:/pub/bio/tgi/software/tgicl> ls -la total 6226561 -rw-rw-r-- 1 178 1111 22683 Oct 17 2006 README -rw-rw-r-- 1 178 1111 19815 Oct 17 2006 mgblast.tar.gz -rw-rw-r-- 1 178 1111 14634 Nov 18 11:45 nrcl.tar.gz -rw-rw-r-- 1 178 1111 7797 Oct 17 2006 psx.tar.gz -rw-rw-r-- 1 178 1111 10963 Oct 17 2006 pvmsx.tar.gz -rw-rw-r-- 1 178 1111 13457 Nov 18 11:45 sclust.tar.gz -rw-rw-r-- 1 178 1111 9563 Nov 18 11:45 tclust.tar.gz -rw-rw-r-- 1 178 1111 59669 Nov 18 11:45 tgi_cpp_library.tar.gz -rw-rw-r-- 1 178 1111 2650874 Dec 17 11:00 tgicl_linux.tar.gz -rw-rw-r-- 1 178 1111 22189 Dec 17 11:01 tgicl_scripts.tar.gz -rw-rw-r-- 1 178 1111 3388797 Oct 17 2006 tgicl_sunos.tar.gz -rw-rw-r-- 1 178 1111 6120 Nov 18 11:45 zmsort.tar.gz yafc occams.dfci.harvard.edu:/pub/bio/tgi/software/tgicl> Please update the checksums in sunrise.
Created attachment 176071 [details] tgi-tools-0.2.ebuild Fixed ebuild. Only make sure you do fix the psx.tar.gz archive: 1) cd /usr/portage/distfiles; "download" psx.tar.gz, 2) mdkir psx; cd psx 3) gzip -dc psx.tar.gz | tar xvf - 4) cd ..; tar cvf - psx | gzip -dc psx.tar.gz 5) cd /usr/local/portage/layman/sunrise/sci-biology/tgi-tools/ 6) rm Manifest 7) ebuild tgi-tools-0.1.ebuild digest 8) emerge tgi-tools
Created attachment 177530 [details] tgi-tools-0.2.ebuild Now everything except mgblast and clview can be compiled. The ebuild installs few more important binaries which were overlooked in the mess initially. And also few more docs. Tested on ~x86. Most parts get compiled from scratch though. Somebody should only teach me how output from `fox-config --cflags' can be used in the .ebuild. ;-) Fix it and commit, please.
modification of TGICL to use MPI instead seems to be available: http://www.generationcp.org/sccv10/sccv10_upload/HPC_brochure.pdf
The Sunrise ebuild for tgi-tools relies on non-versioned distfiles which have changed since the Manifest was last updated. Additionally, I see that it is stored in science overlay too. Thus, I am masking it for removal in 30 days.
(In reply to comment #11) > The Sunrise ebuild for tgi-tools relies on non-versioned distfiles which have > changed since the Manifest was last updated. Additionally, I see that it is > stored in science overlay too. Thus, I am masking it for removal in 30 days. > Thanks, yes, drop the 0.1 ebuild from sunrise. The 0.1 version from science overlay does not compile either, because the files placed originally under gclib/ subdir were split into two gclib/ subdirs. We would have to hack in ebuild two Makefiles to match the correct location for each type of files. Therefore I propose to drop the 0.1 ebuild from science overlay as well and let's focus on the 0.2 ebuild. # find /var/tmp/portage/sci-biology/tgi-tools-0.1/work -name gclib | xargs ls -la /var/tmp/portage/sci-biology/tgi-tools-0.1/work/cdbfasta/gclib: total 388 drwxr-xr-x 2 root root 4096 Jul 6 19:58 . drwxr-xr-x 3 root root 4096 Jul 6 19:58 .. -rw-r--r-- 1 root root 7463 Dec 3 2009 GArgs.cpp -rw-r--r-- 1 root root 2537 Dec 3 2009 GArgs.h -rw-r--r-- 1 root root 19008 Jul 6 19:58 GArgs.o -rw-r--r-- 1 root root 14900 Dec 3 2009 GBase.cpp -rw-r--r-- 1 root root 10104 Dec 3 2009 GBase.h -rw-r--r-- 1 root root 35736 Jul 6 19:58 GBase.o -rw-r--r-- 1 root root 16813 Dec 3 2009 GHash.hh -rw-r--r-- 1 root root 32387 Dec 3 2009 GList.hh -rw-r--r-- 1 root root 33515 Dec 3 2009 GStr.cpp -rw-r--r-- 1 root root 8517 Dec 3 2009 GStr.h -rw-r--r-- 1 root root 79356 Jul 6 19:58 GStr.o -rw-r--r-- 1 root root 23139 Dec 3 2009 gcdb.cpp -rw-r--r-- 1 root root 12290 Dec 3 2009 gcdb.h -rw-r--r-- 1 root root 56084 Jul 6 19:58 gcdb.o /var/tmp/portage/sci-biology/tgi-tools-0.1/work/gclib: total 140 drwxr-xr-x 2 root root 4096 Jul 6 19:58 . drwx------ 18 root root 4096 Jul 6 19:58 .. -rw-r--r-- 1 root root 11632 Sep 17 2008 AceParser.cpp -rw-r--r-- 1 root root 906 Sep 14 2008 AceParser.h -rw-r--r-- 1 root root 11012 Jan 22 2009 GBase.cpp -rw-r--r-- 1 root root 9200 Dec 16 2008 GBase.h -rw-r--r-- 1 root root 28916 Jul 6 19:58 GBase.o -rw-r--r-- 1 root root 16813 Jul 29 2008 GHash.hh -rw-r--r-- 1 root root 16516 Sep 10 2008 GList.hh -rw-r--r-- 1 root root 11221 Jan 22 2009 LayoutParser.cpp -rw-r--r-- 1 root root 6246 Sep 14 2008 LayoutParser.h # Please help me with the 0.2 ebuild instead.
Created attachment 237789 [details] tgi-tools-0.2.ebuild Updated the ebuild to reflect current variable names in the many Makefiles, to reflect split gclib/ contents. Does not compile against sci-biology/ncbi-tools-20090809-r2. We need version maybe as old as 20060507. Can somebody test older versions?
Here is the ncbi-tools compile error: g++ -L/usr/lib -o clview appmain.o mdichild.o clrutils.o ../gclib/LayoutParser.o ../gclib/AceParser.o mainwin.o FXClView.o ../gclib/GBase.o -lFOX-1.6 -lXext -lX11 -lXft -lXrender -lfontconfig -lfreetype -lX11 -lXcursor -lXrandr -ldl -lpthread -lrt -ljpeg -lpng -ltiff -lz -lbz2 -lm -lcups -lnsl -lGLU -lGL gcc -o mgblast -O2 -I/usr/include/ncbi -L/usr/lib mgblast.c -lblastapi \ -lblast -lncbitool -lblastcompadj -lncbiobj -lncbi -lm \ mgblast.c: In function ‘GetLambdaFast’: mgblast.c:2317: error: too few arguments to function ‘Blast_ScoreBlkMatrixInit’ mgblast.c: In function ‘BLAST_FillOptions’: mgblast.c:2391: error: too many arguments to function ‘BLAST_FillInitialWordOptions’ mgblast.c:2402: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’ mgblast.c:2406: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’ mgblast.c:2420: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’ mgblast.c:2427: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’ mgblast.c: In function ‘Main_new’: mgblast.c:2587: warning: passing argument 2 of ‘BlastTabularFormatDataNew’ from incompatible pointer type mgblast.c:2587: error: incompatible type for argument 3 of ‘BlastTabularFormatDataNew’ mgblast.c:2587: error: too few arguments to function ‘BlastTabularFormatDataNew’ mgblast.c:2614: warning: passing argument 2 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 3 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 4 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 5 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 6 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 7 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: warning: passing argument 8 of ‘Blast_DatabaseSearch’ from incompatible pointer type mgblast.c:2614: error: too few arguments to function ‘Blast_DatabaseSearch’ distcc[16971] ERROR: compile mgblast.c on localhost failed make: *** [mgblast] Error 1 Please note the make process continues even if make failed in a subdirectory (so you really get ">>> Source compiled." from emerge). The Makefiles need some fix to prevent this leaky behavior.
I have emailed upstream about the incompatibility with the newer ncbi-tools API. Nevertheless, the ebuild installs all files although clview and the fox libs are binaries from upstream. I think you could commit this into the science overlay for testing on x86 and those 32-bit compatible amd64. It is a set of several tools and it is handy even for purely 64bit users even now.
The upstream bug is at https://compbio.dfci.harvard.edu/jira/browse/TGI-248 . The author is working on the mgblast build issue and for that reason the bug got already closed. :( Hope there will be some updates announced somewhere.
Could you state whether you are going to commit 0.2 into Sunrise or science overlay?
Both waever@ and jlec@ already offered me to join the crew and get the rights over the overlay. Unfortunately, I do not have the time to go through any kind of ebuild writing quiz, etc. Somebody else please commit the ebuild, I will try to help as long as I am CCed. Yes, it installs libFOX-1.0.so.0 and libFOX-1.0.so.0.0.3 binaries from upstream, which is for sure ugly. On my system I have thus fox-1.6 from Gentoo ebuild and the two older version from this tgi-tool package: # ls -la /usr/lib/libFOX* lrwxrwxrwx 1 root root 19 Jul 6 21:45 /usr/lib/libFOX-1.0.so.0 -> libFOX-1.0.so.0.0.3 -rwxr-xr-x 1 root root 3078005 Jul 6 21:44 /usr/lib/libFOX-1.0.so.0.0.3 -rw-r--r-- 1 root root 18802060 May 28 00:18 /usr/lib/libFOX-1.6.a -rw-r--r-- 1 root root 1273 May 28 00:18 /usr/lib/libFOX-1.6.la lrwxrwxrwx 1 root root 20 May 28 00:19 /usr/lib/libFOX-1.6.so -> libFOX-1.6.so.0.0.37 lrwxrwxrwx 1 root root 20 May 28 00:19 /usr/lib/libFOX-1.6.so.0 -> libFOX-1.6.so.0.0.37 -rwxr-xr-x 1 root root 10678444 May 28 00:18 /usr/lib/libFOX-1.6.so.0.0.37 # BTW, I have just emailed the upstream and asked for future versioning of released files on their FTP site, and also commented on the necessity for the many sed hacks to tweak the Makefiles and that the gclib/ got split into two dirs of the same name. Hopefully they will improve the packaging once.
I've removed the package from the Sunrise overlay as stated before.
I spent some days I figuring out how to split the tgi-tools into smallest bits. Upstream developer Valentin Antonescu copied all files from DFCI site to sourceforge. Most of the files were just copied but he re-released the tgicl bundle as TGICL-2.1, which is actually just 2 perl scripts and many binaries of the many tools. He is going to omit the binaries from the package with 3.0 version (mgblast is first on the victim list; hope to slowly phase out tclust, sclust, cdbfasta, cdbyank, zmsort, psx, tgicl_asm.psx and tgicl_cluster.psx). He will start dropping need for some of them in some time. Therefore, after discussion with him it seemed best to split tgi-tools into smallest pieces and in future we will just shorten the list of dependencies. At the moment, I have all ebuild working except those for mgblast (cannot compile, comment #16) and pvsx which requires some PVM environment not available on Gentoo (needs pvm3.h for compilation). These two binaries are installed from the TGICL-2.1.tar.gz archive now. I moved out the cap3 binary as well under sci-biology/cap3-bin package. Only binary is available, sources were never released. Please note the perl scripts typically do not have the .pl extension and some have .psx, etc. I tried to make package description as detailed as I could. Hope repoman will allow me to commit the longer lines. ;-) Several packages actually need gcl (sometimes called gclib) implementation from cdbfasta.tar.gz bundle tgicl.tar.gz provides same subdirectory with a bit more files, with same names, but somehow diverged version. Some tools one while other need the other implementation. It is handled by SRC_URI automatically, of course. Some details about difference of tgicl scripts to gicl scripts: <quote> >> Basically tgicl uses virtual parallel machines, while gicl is trying to >> manage jobs with SGE/Condor and take advantage of their scheduling >> capabilities. And yes, you still need the utilities related with >> clustering/assembly. > I am still a bit lost what tools are in extra in GICL and which are specific > to TGICL, but maybe next time. > For sure, we would prefer the SGE/Torque/PBS compatibility over some local > deamons. SGE does use local demons as well, condor the same. pvm and mpi are in some ways better, but they lack the scheduling capabilities SGE/condor has. However, one can always run pvm over condor, or mpi over SGE. Dunno which one is better. I plan on adding mpi to tgicl at some point, but I am not sure yet. </quote> In brief, I am ready to commit these new packages to science overlay and drop tgi-tools: cap3-bin cdbfasta clview nrcl psx sclust tclust tgicl zmsort Any objections? I could commit mgblast and pvmsx in addition but as I said, they cannot be compiled and sci-biology/tgicl provides 32bit binaries without some more effort.
(In reply to comment #20) > Several packages actually need gcl (sometimes called gclib) implementation > from cdbfasta.tar.gz bundle tgicl.tar.gz provides same subdirectory with a ------------------------------^^^ no, meant tgi_cpp_library.tar.gz > bit more files, with same names, but somehow diverged version. Some tools one > while other need the other implementation. It is handled by SRC_URI > automatically, of course.