Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 157827 - [science overlay] new package: sci-biology/tgi-tools (seqclean, cdbfasta and friends)
Summary: [science overlay] new package: sci-biology/tgi-tools (seqclean, cdbfasta and ...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High enhancement
Assignee: Default Assignee for New Packages
URL: http://biocomp.dfci.harvard.edu/tgi/s...
Whiteboard: Science overlay
Keywords: EBUILD, InOverlay
Depends on: weaver
Blocks:
  Show dependency tree
 
Reported: 2006-12-11 07:09 UTC by Andrey Kislyuk (RETIRED)
Modified: 2011-06-20 06:17 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
sci-biology/tgi-tools-0.1.ebuild (tgi-tools-0.1.ebuild,1.34 KB, text/plain)
2006-12-11 07:11 UTC, Andrey Kislyuk (RETIRED)
Details
tgi-tools-0.2.ebuild (tgi-tools-0.1.ebuild,1.51 KB, text/plain)
2008-12-21 19:14 UTC, Martin Mokrejš
Details
tgi-tools-0.2.ebuild (tgi-tools-0.2.ebuild,4.90 KB, text/plain)
2009-01-06 00:51 UTC, Martin Mokrejš
Details
tgi-tools-0.2.ebuild (tgi-tools-0.2.ebuild,5.71 KB, text/plain)
2010-07-06 19:19 UTC, Martin Mokrejš
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Kislyuk (RETIRED) gentoo-dev 2006-12-11 07:09:02 UTC
An ebuild for some genome sequence analysis tools provided at http://biocomp.dfci.harvard.edu/tgi/software/ (Harvard University Computational Biology and Functional Genomics Laboratory, The Gene Index Project)
Comment 1 Andrey Kislyuk (RETIRED) gentoo-dev 2006-12-11 07:11:19 UTC
Created attachment 103797 [details]
sci-biology/tgi-tools-0.1.ebuild
Comment 2 Andrey Kislyuk (RETIRED) gentoo-dev 2006-12-20 09:59:09 UTC
The latest version can be found at

http://www.gentoo-sunrise.org/sunrise/browser/reviewed/sci-biology/tgi-tools
Comment 3 Olivier Fisette (RETIRED) gentoo-dev 2007-03-09 02:20:53 UTC
I already watch this bug through the sci-biology alias.
Comment 4 George Shapovalov (RETIRED) gentoo-dev 2007-12-15 23:17:03 UTC
Looking at this one.
The ebuild is looking good. This construct may raise some issues:
for i in seqclean/{seqclean,mdust,trimpoly} cdbfasta/cdbfasta tgicl/{psx,pvmsx,zmsort,tclust,sclust,nrcl,
    SRC_URI="${SRC_URI} ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/${i}.tar.gz"
done

Normally running functions or sed/awk invocations in global scope (as it will be run multiple times of=ver the course of ebuild processing) or mangling SRC_URI in nontrivial way is discouraged. However this is a basic cycle over a fixed set producing a static value in SRC_URI, so it should be Ok. Just as a note: static nature of SRC_URI is also essential, since dynamic SRC_URI will screw server-side cache (this is a reason why we cannot put use flags in SRC_URI for example).

Checked the build - went fine, so you can add ~amd64 to the KEYWORDS. Looks Ok in general, just a few remarks on install.
1. How typical of skeletal sci package - only few binaries and scarce and short README files, good that even they exist :).
2. find -iname readme -exec ls -l \{} \; in ${WORKDIR} shows 11 README files, but only two are installed. However all the README's except these two contain only some short blurb, not much usefull information. If not for the copyright information in them I'd say it is indeed not necessary to include them. However since they contain what amounts to copyright claims, it might be better to actually install them. 
Are the copyright claims identical in all of them and the same as in LICENSE? (which there are also 11, but identical ones) If yes, then it is safe to omit those README's, otherwise it is better to install all, even short ones.

George
Comment 5 Andrey Kislyuk (RETIRED) gentoo-dev 2008-01-16 14:56:51 UTC
George,

Yes, the copyright claims are identical in all of them, as far as I know.

This package is not very useful anyway; it probably isn't needed in mainline.
Comment 6 Martin Mokrejš 2008-03-01 12:00:36 UTC
I think this is a nice package if it will really allow me to cluster in parallel EST sequences.
Comment 7 Martin Mokrejš 2008-12-19 13:01:34 UTC
>>> Emerging (1 of 6) sci-biology/tgi-tools-0.1
 * pvmsx.tar.gz RMD160 SHA1 SHA256 size ;-) ...                                                                                                                                                                               [ ok ]
 * mdust.tar.gz RMD160 SHA1 SHA256 size ;-) ...                                                                                                                                                                               [ ok ]
Refetching... File renamed to '/usr/portage/distfiles/tgi_cpp_library.tar.gz._checksum_failure_.dhhvcl'

>>> Downloading 'http://gentoo.mirror.web4u.cz/distfiles/tgi_cpp_library.tar.gz'
--2008-12-19 14:00:24--  http://gentoo.mirror.web4u.cz/distfiles/tgi_cpp_library.tar.gz
Resolving gentoo.mirror.web4u.cz... 81.91.81.13
Connecting to gentoo.mirror.web4u.cz|81.91.81.13|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2008-12-19 14:00:24 ERROR 404: Not Found.

>>> Downloading 'ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/tgicl/tgi_cpp_library.tar.gz'
--2008-12-19 14:00:24--  ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/tgicl/tgi_cpp_library.tar.gz
           => `/usr/portage/distfiles/tgi_cpp_library.tar.gz'
Resolving occams.dfci.harvard.edu... 155.52.47.33
Connecting to occams.dfci.harvard.edu|155.52.47.33|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/bio/tgi/software/tgicl ... done.
==> SIZE tgi_cpp_library.tar.gz ... 59669
==> PASV ... done.    ==> RETR tgi_cpp_library.tar.gz ... done.
Length: 59669 (58K)

100%[===========================================================================================================================================================================================>] 59,669       112K/s   in 0.5s    

2008-12-19 14:00:27 (112 KB/s) - `/usr/portage/distfiles/tgi_cpp_library.tar.gz' saved [59669]

('Filesize does not match recorded size', 59669L, 58212)
!!! Fetched file: tgi_cpp_library.tar.gz VERIFY FAILED!
!!! Reason: Filesize does not match recorded size
!!! Got:      59669
!!! Expected: 58212
Refetching... File renamed to '/usr/portage/distfiles/tgi_cpp_library.tar.gz._checksum_failure_.dhhvcl'

!!! Couldn't download 'tgi_cpp_library.tar.gz'. Aborting.
 * Fetch failed for 'sci-biology/tgi-tools-0.1', Log file:
 *  '/var/tmp/portage/sci-biology/tgi-tools-0.1/temp/build.log'



yafc occams.dfci.harvard.edu:/pub/bio/tgi/software/tgicl> ls -la
total 6226561
-rw-rw-r--   1 178      1111        22683 Oct 17  2006 README
-rw-rw-r--   1 178      1111        19815 Oct 17  2006 mgblast.tar.gz
-rw-rw-r--   1 178      1111        14634 Nov 18 11:45 nrcl.tar.gz
-rw-rw-r--   1 178      1111         7797 Oct 17  2006 psx.tar.gz
-rw-rw-r--   1 178      1111        10963 Oct 17  2006 pvmsx.tar.gz
-rw-rw-r--   1 178      1111        13457 Nov 18 11:45 sclust.tar.gz
-rw-rw-r--   1 178      1111         9563 Nov 18 11:45 tclust.tar.gz
-rw-rw-r--   1 178      1111        59669 Nov 18 11:45 tgi_cpp_library.tar.gz
-rw-rw-r--   1 178      1111      2650874 Dec 17 11:00 tgicl_linux.tar.gz
-rw-rw-r--   1 178      1111        22189 Dec 17 11:01 tgicl_scripts.tar.gz
-rw-rw-r--   1 178      1111      3388797 Oct 17  2006 tgicl_sunos.tar.gz
-rw-rw-r--   1 178      1111         6120 Nov 18 11:45 zmsort.tar.gz
yafc occams.dfci.harvard.edu:/pub/bio/tgi/software/tgicl>


Please update the checksums in sunrise.
Comment 8 Martin Mokrejš 2008-12-21 19:14:16 UTC
Created attachment 176071 [details]
tgi-tools-0.2.ebuild

Fixed ebuild. Only make sure you do fix the psx.tar.gz archive:
1) cd /usr/portage/distfiles; "download" psx.tar.gz, 
2) mdkir psx; cd psx
3) gzip -dc psx.tar.gz | tar xvf -
4) cd ..; tar cvf - psx | gzip -dc psx.tar.gz
5) cd /usr/local/portage/layman/sunrise/sci-biology/tgi-tools/
6) rm Manifest
7) ebuild tgi-tools-0.1.ebuild digest
8) emerge tgi-tools
Comment 9 Martin Mokrejš 2009-01-06 00:51:14 UTC
Created attachment 177530 [details]
tgi-tools-0.2.ebuild

Now everything except mgblast and clview can be compiled. The ebuild installs few more important binaries which were overlooked in the mess initially. And also few more docs. Tested on ~x86. Most parts get compiled from scratch though.

Somebody should only teach me how output from `fox-config --cflags' can be used in the .ebuild. ;-) Fix it and commit, please.
Comment 10 Martin Mokrejš 2009-01-22 18:22:55 UTC
modification of TGICL to use MPI instead seems to be available: http://www.generationcp.org/sccv10/sccv10_upload/HPC_brochure.pdf
Comment 11 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2010-07-06 13:32:45 UTC
The Sunrise ebuild for tgi-tools relies on non-versioned distfiles which have changed since the Manifest was last updated. Additionally, I see that it is stored in science overlay too. Thus, I am masking it for removal in 30 days.
Comment 12 Martin Mokrejš 2010-07-06 19:14:32 UTC
(In reply to comment #11)
> The Sunrise ebuild for tgi-tools relies on non-versioned distfiles which have
> changed since the Manifest was last updated. Additionally, I see that it is
> stored in science overlay too. Thus, I am masking it for removal in 30 days.
> 

Thanks, yes, drop the 0.1 ebuild from sunrise.

The 0.1 version from science overlay does not compile either, because the files placed originally under gclib/ subdir were split into two gclib/ subdirs. We would have to hack in ebuild two Makefiles to match the correct location for each type of files. Therefore I propose to drop the 0.1 ebuild from science overlay as well and let's focus on the 0.2 ebuild.

# find /var/tmp/portage/sci-biology/tgi-tools-0.1/work -name gclib | xargs ls -la
/var/tmp/portage/sci-biology/tgi-tools-0.1/work/cdbfasta/gclib:
total 388
drwxr-xr-x 2 root root  4096 Jul  6 19:58 .
drwxr-xr-x 3 root root  4096 Jul  6 19:58 ..
-rw-r--r-- 1 root root  7463 Dec  3  2009 GArgs.cpp
-rw-r--r-- 1 root root  2537 Dec  3  2009 GArgs.h
-rw-r--r-- 1 root root 19008 Jul  6 19:58 GArgs.o
-rw-r--r-- 1 root root 14900 Dec  3  2009 GBase.cpp
-rw-r--r-- 1 root root 10104 Dec  3  2009 GBase.h
-rw-r--r-- 1 root root 35736 Jul  6 19:58 GBase.o
-rw-r--r-- 1 root root 16813 Dec  3  2009 GHash.hh
-rw-r--r-- 1 root root 32387 Dec  3  2009 GList.hh
-rw-r--r-- 1 root root 33515 Dec  3  2009 GStr.cpp
-rw-r--r-- 1 root root  8517 Dec  3  2009 GStr.h
-rw-r--r-- 1 root root 79356 Jul  6 19:58 GStr.o
-rw-r--r-- 1 root root 23139 Dec  3  2009 gcdb.cpp
-rw-r--r-- 1 root root 12290 Dec  3  2009 gcdb.h
-rw-r--r-- 1 root root 56084 Jul  6 19:58 gcdb.o

/var/tmp/portage/sci-biology/tgi-tools-0.1/work/gclib:
total 140
drwxr-xr-x  2 root root  4096 Jul  6 19:58 .
drwx------ 18 root root  4096 Jul  6 19:58 ..
-rw-r--r--  1 root root 11632 Sep 17  2008 AceParser.cpp
-rw-r--r--  1 root root   906 Sep 14  2008 AceParser.h
-rw-r--r--  1 root root 11012 Jan 22  2009 GBase.cpp
-rw-r--r--  1 root root  9200 Dec 16  2008 GBase.h
-rw-r--r--  1 root root 28916 Jul  6 19:58 GBase.o
-rw-r--r--  1 root root 16813 Jul 29  2008 GHash.hh
-rw-r--r--  1 root root 16516 Sep 10  2008 GList.hh
-rw-r--r--  1 root root 11221 Jan 22  2009 LayoutParser.cpp
-rw-r--r--  1 root root  6246 Sep 14  2008 LayoutParser.h
# 

Please help me with the 0.2 ebuild instead.
Comment 13 Martin Mokrejš 2010-07-06 19:19:59 UTC
Created attachment 237789 [details]
tgi-tools-0.2.ebuild

Updated the ebuild to reflect current variable names in the many Makefiles, to reflect split gclib/ contents. Does not compile against sci-biology/ncbi-tools-20090809-r2. We need version maybe as old as 20060507. Can somebody test older versions?
Comment 14 Martin Mokrejš 2010-07-06 19:22:08 UTC
Here is the ncbi-tools compile error:

g++ -L/usr/lib -o clview appmain.o mdichild.o clrutils.o ../gclib/LayoutParser.o ../gclib/AceParser.o mainwin.o FXClView.o ../gclib/GBase.o -lFOX-1.6 -lXext -lX11 -lXft -lXrender -lfontconfig -lfreetype -lX11 -lXcursor -lXrandr -ldl -lpthread -lrt -ljpeg -lpng -ltiff -lz -lbz2 -lm -lcups -lnsl -lGLU -lGL
gcc -o mgblast -O2 -I/usr/include/ncbi  -L/usr/lib mgblast.c  -lblastapi \
                -lblast -lncbitool -lblastcompadj -lncbiobj -lncbi -lm \

mgblast.c: In function ‘GetLambdaFast’:
mgblast.c:2317: error: too few arguments to function ‘Blast_ScoreBlkMatrixInit’
mgblast.c: In function ‘BLAST_FillOptions’:
mgblast.c:2391: error: too many arguments to function ‘BLAST_FillInitialWordOptions’
mgblast.c:2402: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’
mgblast.c:2406: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’
mgblast.c:2420: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’
mgblast.c:2427: error: ‘BlastInitialWordOptions’ has no member named ‘ungapped_extension’
mgblast.c: In function ‘Main_new’:
mgblast.c:2587: warning: passing argument 2 of ‘BlastTabularFormatDataNew’ from incompatible pointer type
mgblast.c:2587: error: incompatible type for argument 3 of ‘BlastTabularFormatDataNew’
mgblast.c:2587: error: too few arguments to function ‘BlastTabularFormatDataNew’
mgblast.c:2614: warning: passing argument 2 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 3 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 4 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 5 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 6 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 7 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: warning: passing argument 8 of ‘Blast_DatabaseSearch’ from incompatible pointer type
mgblast.c:2614: error: too few arguments to function ‘Blast_DatabaseSearch’
distcc[16971] ERROR: compile mgblast.c on localhost failed
make: *** [mgblast] Error 1


Please note the make process continues even if make failed in a subdirectory (so you really get ">>> Source compiled." from emerge). The Makefiles need some fix to prevent this leaky behavior.
Comment 15 Martin Mokrejš 2010-07-06 19:42:18 UTC
I have emailed upstream about the incompatibility with the newer ncbi-tools API. Nevertheless, the ebuild installs all files although clview and the fox libs are binaries from upstream. I think you could commit this into the science overlay for testing on x86 and those 32-bit compatible amd64. It is a set of several tools and it is handy even for purely 64bit users even now.
Comment 16 Martin Mokrejš 2010-07-07 21:07:33 UTC
The upstream bug is at https://compbio.dfci.harvard.edu/jira/browse/TGI-248 . The author is working on the mgblast build issue and for that reason the bug got already closed. :( Hope there will be some updates announced somewhere.
Comment 17 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2010-07-07 21:12:27 UTC
Could you state whether you are going to commit 0.2 into Sunrise or science overlay?
Comment 18 Martin Mokrejš 2010-07-07 21:26:55 UTC
Both waever@ and jlec@ already offered me to join the crew and get the rights over the overlay. Unfortunately, I do not have the time to go through any kind of ebuild writing quiz, etc. Somebody else please commit the ebuild, I will try to help as long as I am CCed. Yes, it installs libFOX-1.0.so.0 and libFOX-1.0.so.0.0.3 binaries from upstream, which is for sure ugly. On my system I have thus fox-1.6 from Gentoo ebuild and the two older version from this tgi-tool package:

# ls -la /usr/lib/libFOX*
lrwxrwxrwx 1 root root       19 Jul  6 21:45 /usr/lib/libFOX-1.0.so.0 -> libFOX-1.0.so.0.0.3
-rwxr-xr-x 1 root root  3078005 Jul  6 21:44 /usr/lib/libFOX-1.0.so.0.0.3
-rw-r--r-- 1 root root 18802060 May 28 00:18 /usr/lib/libFOX-1.6.a
-rw-r--r-- 1 root root     1273 May 28 00:18 /usr/lib/libFOX-1.6.la
lrwxrwxrwx 1 root root       20 May 28 00:19 /usr/lib/libFOX-1.6.so -> libFOX-1.6.so.0.0.37
lrwxrwxrwx 1 root root       20 May 28 00:19 /usr/lib/libFOX-1.6.so.0 -> libFOX-1.6.so.0.0.37
-rwxr-xr-x 1 root root 10678444 May 28 00:18 /usr/lib/libFOX-1.6.so.0.0.37
#

BTW, I have just emailed the upstream and asked for future versioning of released files on their FTP site, and also commented on the necessity for the many sed hacks to tweak the Makefiles and that the gclib/ got split into two dirs of the same name. Hopefully they will improve the packaging once.
Comment 19 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2010-08-08 18:38:20 UTC
I've removed the package from the Sunrise overlay as stated before.
Comment 20 Martin Mokrejš 2010-12-04 00:42:47 UTC
I spent some days I figuring out how to split the tgi-tools into smallest bits. Upstream developer Valentin Antonescu copied all files from DFCI site to sourceforge. Most of the files were just copied but he re-released the tgicl bundle as TGICL-2.1, which is actually just 2 perl scripts and many binaries of the many tools. He is going to omit the binaries from the package with 3.0 version (mgblast is first on the victim list; hope to slowly phase out tclust, sclust, cdbfasta, cdbyank, zmsort, psx, tgicl_asm.psx and tgicl_cluster.psx). He will start dropping need for some of them in some time. Therefore, after discussion with him it seemed best to split tgi-tools into smallest pieces and in future we will just shorten the list of dependencies.

At the moment, I have all ebuild working except those for mgblast (cannot compile, comment #16) and pvsx which requires some PVM environment not available on Gentoo (needs pvm3.h for compilation). These two binaries are installed from the TGICL-2.1.tar.gz archive now.

I moved out the cap3 binary as well under sci-biology/cap3-bin package.  Only binary is available, sources were never released.

Please note the perl scripts typically do not have the .pl extension and some have .psx, etc. I tried to make package description as detailed as I could. Hope repoman will allow me to commit the longer lines. ;-)

Several packages actually need gcl (sometimes called gclib) implementation from cdbfasta.tar.gz bundle tgicl.tar.gz provides same subdirectory with a bit more files, with same names, but somehow diverged version. Some tools one while other need the other implementation. It is handled by SRC_URI automatically, of course.


Some details about difference of tgicl scripts to gicl scripts:

<quote>
>> Basically tgicl uses virtual parallel machines, while gicl is trying to 
>> manage jobs with SGE/Condor and take advantage of their scheduling  
>> capabilities. And yes, you still need the utilities related with 
>> clustering/assembly. 

> I am still a bit lost what tools are in extra in GICL and which are specific 
> to TGICL, but maybe next time.
> For sure, we would prefer the SGE/Torque/PBS compatibility over some local 
> deamons.

SGE does use local demons as well, condor the same. pvm and mpi are in some ways better, but they lack the scheduling capabilities SGE/condor has. However, one can always run pvm over condor, or mpi over SGE. Dunno which one is better. I plan on adding mpi to tgicl at some point, but I am not sure yet.  
</quote>



In brief, I am ready to commit these new packages to science overlay and drop tgi-tools: cap3-bin cdbfasta clview nrcl psx sclust tclust tgicl zmsort
Any objections?

I could commit mgblast and pvmsx in addition but as I said, they cannot be compiled and sci-biology/tgicl provides 32bit binaries without some more effort.
Comment 21 Martin Mokrejš 2010-12-04 00:57:37 UTC
(In reply to comment #20)

> Several packages actually need gcl (sometimes called gclib) implementation 
> from cdbfasta.tar.gz bundle tgicl.tar.gz provides same subdirectory with a
------------------------------^^^ no, meant tgi_cpp_library.tar.gz
> bit more files, with same names, but somehow diverged version. Some tools one
> while other need the other implementation. It is handled by SRC_URI 
> automatically, of course.