Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 23249 - making app-arch/rpm2targz even more useful (faster)
Summary: making app-arch/rpm2targz even more useful (faster)
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Alastair Tse (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-21 16:16 UTC by Georgi Georgiev
Modified: 2003-06-25 17:10 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
rpm2targz-bzip2.diff (rpm2targz-bzip2.diff,568 bytes, patch)
2003-06-24 09:49 UTC, Georgi Georgiev
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Georgi Georgiev 2003-06-21 16:16:08 UTC
This bugreport is a consequence of a comment to bug #23179 regarding the speed
of rpm2targz. The short summary is:

The rpm2targz package only includes a script that converts an rpm package to a
tar.gz file. This behavior may be useful in Slackware, but it is not really
useful in Gentoo, because what Gentoo folks need is files, not tar.gz files. The
rpm2targz script wastes resources on a "gzip -9" command which could take a
really long time and which is not really useful for us.

I suggest the rpm2targz ebuild includes an extra file or two:

# cat /usr/bin/rpm2cpiocat
#!/bin/sh
dd ibs=`rpmoffset < "$1"` skip=1 if="$1" 2> /dev/null | gzip -dc

and also

# cat /usr/bin/rpmextract
#!/bin/sh
dd ibs=`rpmoffset < "$1"` skip=1 if="$1" 2> /dev/null | gzip -dc | cpio
--extract --make-directories --unconditional

This is just a basic suggestion. Of course these scripts can be made much
prettier, but they would be good enough for ebuilds even like this, and I only
wanted to make a working suggestion, not give a complete solution.
Comment 1 Tavis Ormandy (RETIRED) gentoo-dev 2003-06-21 16:49:17 UTC
we've been doing this for a while (check out dev-lang/ccc), i know some developers have previously suggested an extension to unpack() that will recognise rpm format and use a method like this.
Comment 2 Georgi Georgiev 2003-06-21 16:59:48 UTC
I really like this unpack() suggestion. What happened? Should someone submit a new bugreport in the portage category? Is there already one? "rpm" finds nothing useful on bugzilla.

Regarding the ccc ebuild -- great! Shouldn't we suggest the rest of the ebuild that use rpm2cpio adopt the same idea for the moment. It seems those ebuilds are:

scim-chinese
mkl
sgi-oss-glu
icaclient

Maybe someone could warn the maintainers of those ebuilds to take a look at this bug.
Comment 3 Alastair Tse (RETIRED) gentoo-dev 2003-06-22 04:49:36 UTC
interesting.

regarding scim-chinese, there is a reason why i'm using rpm2cpio, it is because rpmoffset plain doesn't work on that rpm. if you can get it to work, please let me know. otherwise, even the README for rpm2targz states that it doesn't work for all rpm2, 9.0 that i just commited even uses rpm2cpio if that is avaliable because it works much better than rpmoffset.
Comment 4 Garen 2003-06-24 07:11:28 UTC
What about creating a "virtual" (?) thats able to extract stuff from rpm files?  Then for people 
who've already got rpm installed they can use that or if not, require something more minimal 
etc. 
 
Comment 5 Alastair Tse (RETIRED) gentoo-dev 2003-06-24 07:51:30 UTC
no that won't work. some rpms don't work with rpm2targz and need rpm explicitly to extract.
Comment 6 Garen 2003-06-24 08:44:44 UTC
The rpmoffset.c program his this line: 
 
if (*p == '\037' && p[1] == '\213' && p[2] == '\010') 
 
Overe here: http://www.rpm.org/max-rpm/s1-rpm-file-format-rpm-file-format.html it mentions that 
a gzipped archive starts with 1F8B (hex), and 1F happens to be 37 octal and 8B is 213 octal.  
No idea what the 10 octal is for...   
 
Also, this is interesting: 
 
> ./rpmoffset < scim-chinese-0.2.2-1.i586.rpm 
> 
 
It never finds an offset.  however: 
> grep "BZ" scim-chinese-0.2.2-1.i586.rpm 
Binary file scim-chinese-0.2.2-1.i586.rpm matches 
> 
 
My guess would be that it's because the archive doesn't have a gzipped compressed file.  
Some error checking should be added in the rpm eclass to make sure it returns something...  
"BZ" is the magic string identifier for a bzip2 compressed file....  I can't find any info on the rpm 
format that lists it though so maybe it just appeared in there by chance.  Will check. 
 
Comment 7 Georgi Georgiev 2003-06-24 09:48:50 UTC
Wow... looking at the rpm2cpio.c source reveals the following piece of code

    if (!strcmp(payload_compressor, "gzip"))
        t = stpcpy(t, ".gzdio");
    if (!strcmp(payload_compressor, "bzip2"))
        t = stpcpy(t, ".bzdio");
    }

Comment #6 tells us the obvious problem - rpmoffset does not check for bzip2 compression. I used the data from /usr/share/misc/file/magic to see that "BZh" is actually the magic for bzip2 and patching rpmoffset.c as follows:

(see the attachment I'll attach next as well)

--- rpmoffset.c 2003-06-21 21:25:14.000000000 +0900
+++ rpmoffset.c 2003-06-25 01:34:53.000000000 +0900
@@ -16,8 +16,13 @@
 {
         char *buff = malloc(RPMBUFSIZ),*eb,*p;
         for (p = buff, eb = buff + read(0,buff,RPMBUFSIZ); p < eb; p++)
+               {
                 if (*p == '\037' && p[1] == '\213' && p[2] == '\010')
                         printf("%d\n",p - buff),
                         exit(0);
+                               if (*p == 'B' && p[1] == 'Z' && p[2] == 'h' )
+                        printf("%d\n",p - buff),
+                        exit(0);
+               }
         exit(1);
 }


actually makes rpmoffset produce 4412 for "rpmoffset < scim-chinese*"

However, this also means that we'd need to do something like:

  cmd="dd if=$rpm bs=`./rpmoffset < $rpm` skip=1"
  case $cmd 2>/dev/null | file - in
  *gzip*)
  cat=gzcat
  ;;
  *bzip*)
  cat=bzcat
  ;;
  esac

in order to uncompress it nicely.

Mr. Tse, maybe you can use this in your rpm.eclass?
Comment 8 Georgi Georgiev 2003-06-24 09:49:43 UTC
Created attachment 13781 [details, diff]
rpm2targz-bzip2.diff

makes rpmoffset.c recognize bzip2 compressed data
Comment 9 Georgi Georgiev 2003-06-24 09:51:43 UTC
Ooops,

- case $cmd 2>/dev/null | file - in
+ case "`$cmd 2>/dev/null | file -`" in
Comment 10 Garen 2003-06-24 10:06:44 UTC
Well, looks like you beat me to the patch. :) 
 
> dd if=scim-chinese-0.2.2-1.i586.rpm bs=4412 skip=1 | file - 
503+1 records in 
503+1 records out 
/dev/stdin: bzip2 compressed data, block size = 900k 
 
Whaddya know..  Btw, anyone know why this damn thing keeps inserting newlines in my 
comments? 
 
it is 
kind of 
annoying. 
 
Comment 11 Georgi Georgiev 2003-06-24 10:29:25 UTC
[OT] Re: newlines in comments

It's the fault of <textarea wrap="physical"> it seems. Nice comment on the subject can be found at http://www.utexas.edu/learn/forms/boxes.html . I am now trying to submit this with opera and it seems it will submit it nicely (no newlines on this line I hope). I was really annoyed when submitting bug #22722. If this goes in nicely, we'd know we've better *not* use Mozilla with bugzillas :)

Also, I forgot to add

$cmd 2>/dev/null | $cat

to the end of the sample script in comment #7
Comment 12 Georgi Georgiev 2003-06-24 12:07:38 UTC
Another approach would be to patch rpmoffset to produce output like:

gzcat:12345
bzcat:12345

rpm2targz would have to be patched to understand this output as well, just in case there are people who use it.

We can then have the much simpler:

offset=`rpmoffset < $rpm`
dd if="$rpm" bs=${offset##*:} skip=1 | ${offset%%:*} | cpio ....

This would however break packages that already use rpmoffset. Maybe the patched rpmoffset can be included with a different name, because it would be much easier to use anyway.
Comment 13 Alastair Tse (RETIRED) gentoo-dev 2003-06-24 17:27:08 UTC
hmm, the changes seem interesting enough to propogate back to slackware, since this is their code.

so the solutions we have is:
1. add bzip2 detection and change the output of rpmoffset to output both compression type and offset
2. add bzip2 detection and keep the same output format, add some further logic in the rpm.eclass to handle that case by running file against it.
3. totally rewrite rpmoffset (its only a couple of lines of code) to do the decompression as well so it is more "rpm2cpio"-like.

i guess since george has already done (2), that is probably the easiest of them all. assigning to myself, i'll fix it within the next couple of days
Comment 14 Alastair Tse (RETIRED) gentoo-dev 2003-06-25 17:10:44 UTC
ok .. the changes have been committed. thanks for your help guys. scim-chinese can now use the rpmoffset and rpm eclass happily. as for the other packages, sgi-oss-glu is p.masked and will prob be removed in the near future.

mkl and icaclient are both fetch restricted packages. the URI for mkl is incorrect as well, so i can't verify that they work. i'll leave it to the maintainers of those packages to switch it over to the rpm eclass.