Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 534206

Summary: sys-apps/portage: NEEDED.ELF.2 does not distinguish all possible ABI types, such as x32 for x86_64, and various MIPS multilib ABIs
Product: Portage Development Reporter: Zac Medico <zmedico>
Component: CoreAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: blueness, esigra, mips, multilib+disabled, nikoli
Priority: Normal Keywords: InVCS
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 240323, 282639, 484436    
Attachments: elf-multilib-id.py: reference implementation for generating a multilib identifier
elf-multilib-id.py: reference implementation for generating a multilib identifier
c utility to obtain the abstract abi name for gentoo

Description Zac Medico gentoo-dev 2015-01-01 14:13:27 UTC
In order for correct preserve-libs handling (among other things such as GLEP 64 and soname deps for binary packages), portage needs a unique identifier for each multilib ABI that may be installed on the same system. Currently, NEEDED.ELF.2 only contains the the ELF header's e_machine member for this purpose. This is not sufficient for distinguishing x32 for x86_64, or for various MIPS multilib ABIs.

For example, /var/db/pkg/sys-libs/zlib-1.2.8-r1/NEEDED.ELF.2 from stage3-x32-20140508.tar.bz2 contains the following:

	X86_64;/libx32/libz.so.1.2.8;libz.so.1;;libc.so.6
	X86_64;/lib64/libz.so.1.2.8;libz.so.1;;libc.so.6

The same file from stage3-mips64el_multilib-20140908.tar.bz2 contains the following:

	MIPS;/lib/libz.so.1.2.8;libz.so.1;;libc.so.6,ld.so.1
	MIPS;/lib32/libz.so.1.2.8;libz.so.1;;libc.so.6,ld.so.1

In order to properly distinguish all of the possible multilib ABIs present on a given system, it appears that we will also need to take into account the ELF header's e_ident[EI_CLASS] byte (ELFCLASS32 distinguishes x32), as well as e_flags member (distinguishes MIPS multilib ABIs).
Comment 1 Anthony Basile gentoo-dev 2015-01-01 18:14:51 UTC
Excellent!  Up until now we've been trusting a convention where LIBDIR both isolates and distinguishes abis.  For example, on amd64, lib = 32-bit, lib64 = 64-bit, libx32 = x32, etc or on mips lib = o32, lib32 = n32, lib64 = n64.  But this is a weak convention and can be broken as easily as changing LIBDIR_abi in a profile.

Do you think we'll need a new NEEDED.ELF.3 because it will change the format of NEEDED.ELF.2?
Comment 2 Zac Medico gentoo-dev 2015-01-01 19:28:33 UTC
(In reply to Anthony Basile from comment #1)
> Excellent!  Up until now we've been trusting a convention where LIBDIR both
> isolates and distinguishes abis.  For example, on amd64, lib = 32-bit, lib64
> = 64-bit, libx32 = x32, etc or on mips lib = o32, lib32 = n32, lib64 = n64. 
> But this is a weak convention and can be broken as easily as changing
> LIBDIR_abi in a profile.

We should also consider pre-built binaries installed in places like /opt, for which we need to be able to distinguish their ABI so that we can correctly resolve their library dependencies.

> Do you think we'll need a new NEEDED.ELF.3 because it will change the format
> of NEEDED.ELF.2?

We can simply add one or two additional fields to NEEDED.ELF.2. The new fields won't be necessary except on multilib systems with ABIs that are otherwise indistinguishable. So, regeneration of NEEDED.ELF.2 data can be skipped if the ABIs enabled by the current profile indicate that the new fields are not needed.

We'll only need one new field if we come up with a naming convention to uniquely identify all of the relevant ABIs. We can use the same naming convention in soname dependencies for binary packages, as discussed here:

  http://thread.gmane.org/gmane.linux.gentoo.devel/94145

It would be nice if there was an existing convention for multilib ABI identifiers that we could re-use. If no such convention already exists, then I guess we could use the convention established by Gentoo's abi_* USE flags, and simply omit the abi_ prefix. So the current list derived from profiles/desc/abi_* would be as follows:

  mips_n32 mips_n64 mips_o32 ppc_32 ppc_64 s390_32 s390_64 x86_32 x86_64 x86_x32
Comment 3 Zac Medico gentoo-dev 2015-01-02 18:02:15 UTC
My current plan is to have portage generate one additional NEEDED.ELF.2 field for the multilib ABI identifier. The identifier will be one of following, derived from profiles/desc/abi_*:

  mips_n32 mips_n64 mips_o32 ppc_32 ppc_64 s390_32 s390_64 x86_32 x86_64 x86_x32

If the ABI doesn't match one of the above, then the new field will be left blank.
Comment 4 Zac Medico gentoo-dev 2015-01-03 01:16:01 UTC
(In reply to Zac Medico from comment #3)
> If the ABI doesn't match one of the above, then the new field will be left
> blank.

Actually, it will be useful to have multilib ABI identifiers for all architectures, regardless of whether or not they currently have multilib support. For example, this will be useful if we introduce multilib for ARM at some point (a Linux patch for ARM64 ILP32 support was submitted last September).

For all of the identifiers listed in profiles/arch.list (excluding OS-specific values), it will suffice to use ${ARCH%64}_{32,64}, where the width is determined by the ELF header's e_ident[EI_CLASS] byte (ELFCLASS32 OR ELFCLASS64). So, the complete list of supported multilib ABI identifiers will be as follows:

	alpha_{32,64}
	arm_{32,64}
	hppa_{32,64}
	ia_{32,64}
	m68k_{32,64}
	mips_{n32,n64,o32}
	ppc_{32,64}
	s390_{32,64}
	sh_{32,64}
	sparc_{32,64}
	x86_{32,64,x64}

Some of these may not even exist (especially architectures that only have 32-bit variants), but that's okay. The important thing is that we have a naming convention that is as consistent as possible. Of course, we could omit the _32 suffix for 32-bit only architectures, but I don't think that would be worth the inconsistency.
Comment 5 Zac Medico gentoo-dev 2015-01-03 01:31:23 UTC
@mips: Are there any mips ABIs other that n32, n64, and o32 that are relevant to Gentoo? If so, then we should add them to the list shown in comment #4.
Comment 6 Zac Medico gentoo-dev 2015-01-03 08:00:24 UTC
Created attachment 393042 [details]
elf-multilib-id.py: reference implementation for generating a multilib identifier

Supported identifiers:

	alpha_{32,64}
	arm_{32,64}
	hppa_{32,64}
	ia_{32,64}
	m68k_{32,64}
	mips_{eabi32,eabi64,n32,n64,o32,o64}
	ppc_{32,64}
	s390_{32,64}
	sh_{32,64}
	sparc_{32,64}
	x86_{32,64,x64}

NOTES:

* The ABIs referenced by some of the above *_32 and *_64 identifiers
  may be imaginary, but they are listed anyway, since the goal is to
  establish a naming convention that is as consistent and uniform as
  possible.

* The Elf header's e_ident[EI_OSABI] byte is completely ignored,
  since OS-independence is one of the goals. The assumption is that,
  for given installation, we are only interested in tracking multilib
  ABIs for a single OS.
Comment 7 Zac Medico gentoo-dev 2015-01-03 08:45:44 UTC
Created attachment 393046 [details]
elf-multilib-id.py: reference implementation for generating a multilib identifier

This version fixes logic for x32.
Comment 8 Anthony Basile gentoo-dev 2015-01-03 14:48:17 UTC
(In reply to Zac Medico from comment #5)
> @mips: Are there any mips ABIs other that n32, n64, and o32 that are
> relevant to Gentoo? If so, then we should add them to the list shown in
> comment #4.

There is o64 but no one really uses it. See

https://gcc.gnu.org/projects/mipso64-abi.html
Comment 9 Anthony Basile gentoo-dev 2015-01-03 14:54:52 UTC
(In reply to Anthony Basile from comment #8)
> (In reply to Zac Medico from comment #5)
> > @mips: Are there any mips ABIs other that n32, n64, and o32 that are
> > relevant to Gentoo? If so, then we should add them to the list shown in
> > comment #4.
> 
> There is o64 but no one really uses it. See
> 
> https://gcc.gnu.org/projects/mipso64-abi.html

Sorry you do have that in the list.  To answer your question more directly, we are not supporting o64 in gentoo.
Comment 10 Anthony Basile gentoo-dev 2015-01-03 15:31:02 UTC
(In reply to Zac Medico from comment #7)
> Created attachment 393046 [details]
> elf-multilib-id.py: reference implementation for generating a multilib
> identifier
> 
> This version fixes logic for x32.

Two points:

1) shouldn't we write this in C for speed?  I'm thinking of the install-xattr issue.  It may not be an issue here because we won't be running it as much, only on ELFs.

We can add the code to scanelf or make it an independant utility in the pax-utils package.

I'll hack something up so we can test speeds.

2) Still thinking of overhead, can we avoid import collections and just return the tuple rather than naming it. This

    return ei_class, ei_data, e_machine, e_flags

rather than this

    return _elf_header(ei_class, ei_data, e_machine, e_flags)

The named tuple is nice, but might add overhead.
Comment 11 Zac Medico gentoo-dev 2015-01-03 17:59:51 UTC
(In reply to Anthony Basile from comment #10)
> (In reply to Zac Medico from comment #7)
> > Created attachment 393046 [details]
> > elf-multilib-id.py: reference implementation for generating a multilib
> > identifier
> > 
> > This version fixes logic for x32.
> 
> Two points:
> 
> 1) shouldn't we write this in C for speed?  I'm thinking of the
> install-xattr issue.  It may not be an issue here because we won't be
> running it as much, only on ELFs.
> 
> We can add the code to scanelf or make it an independant utility in the
> pax-utils package.
> 
> I'll hack something up so we can test speeds.

Sure. Keep in mind that this script is only a reference implementation. It's mainly provided as a reference for the prefix team and for maintainers of alternative package managers. If the prefix team supports multilib preserve-libs for executable formats other than ELF, then they will need to adapt portage's implementation for the other executable formats.

> 2) Still thinking of overhead, can we avoid import collections and just
> return the tuple rather than naming it.

If we amortize the cost of the import by scanning all of the binaries for a given package at the same time, then the import cost in negligible.

For the initial implementation in portage, I was planning to have the main python processes do post-processing of the NEEDED.ELF.2 file generated by scanelf. In this case, the main python process is relatively long-lived, and only needs to import the collections module once per emerge invocation.
Comment 12 Anthony Basile gentoo-dev 2015-01-04 23:29:05 UTC
(In reply to Zac Medico from comment #6)
> Supported identifiers:
> 
> 	alpha_{32,64}
> 	arm_{32,64}
> 	hppa_{32,64}
> 	ia_{32,64}
> 	m68k_{32,64}
> 	mips_{eabi32,eabi64,n32,n64,o32,o64}
> 	ppc_{32,64}
> 	s390_{32,64}
> 	sh_{32,64}
> 	sparc_{32,64}
> 	x86_{32,64,x64}
> 

I'm not sure this is going to work cleanly.  I've translated the python to C but right now I'm caught up in the quagmire of how to encode the information in NEEDED.ELF.2 fields in an non ambiguous way.  Right now I'm leaing towards five fields:

    arch : word size : abi name : endian : float

where

   arch = gentoo equivalent name for e_machine, eg amd64 for EM_X86_64

   word size = 32 or 64 bits

   abi = the name given to the abi either glibc or the kernel or binutils
   (i haven't decided yet, this is the tough one I'm exploring right now)

   endian = little/big

   float = soft, hard or na for where its not an issue
Comment 13 Zac Medico gentoo-dev 2015-01-05 03:01:05 UTC
(In reply to Anthony Basile from comment #12)
> I'm not sure this is going to work cleanly.  I've translated the python to C
> but right now I'm caught up in the quagmire of how to encode the information
> in NEEDED.ELF.2 fields in an non ambiguous way.

That's precisely why I have suggested that we include the abstract identifiers listed in comment #6. If we assume that there's no need for a particular system to mix little endian and big endian binaries, and that there's no need to mix hardfloat and softfloat binaries, then the abstract identifiers abstract away details that are irrelevant to the task of uniquely distinguishing the multilib ABIs present on a particular system. The same abstract identifiers can be applied regardless of whether the system happens to be little or big endian, hardfloat or softfloat. They can even be applied independently of the operating system, for all Unix-like operating systems supported by Gentoo and Gentoo Prefix.

>    arch = gentoo equivalent name for e_machine, eg amd64 for EM_X86_64
> 
>    word size = 32 or 64 bits
> 
>    abi = the name given to the abi either glibc or the kernel or binutils
>    (i haven't decided yet, this is the tough one I'm exploring right now)
> 
>    endian = little/big
> 
>    float = soft, hard or na for where its not an issue

In practice, how many of these are actually needed in order to uniquely distinguish all of the multilib ABIs that will be present on a particular system? I'm not necessarily opposed to including new fields for word size, abi, endian, and float, but we should question how useful they are in practice. If they're not very useful, then maybe we should only include the abstract identifiers. At least it saves us from having to come up with a concrete identifier for every abi in existence!
Comment 14 Anthony Basile gentoo-dev 2015-01-05 14:20:53 UTC
(In reply to Zac Medico from comment #13)
> In practice, how many of these are actually needed in order to uniquely
> distinguish all of the multilib ABIs that will be present on a particular
> system? I'm not necessarily opposed to including new fields for word size,
> abi, endian, and float, but we should question how useful they are in
> practice. If they're not very useful, then maybe we should only include the
> abstract identifiers. At least it saves us from having to come up with a
> concrete identifier for every abi in existence!

Well that's what I'm debating in my head.  My gut reaction was to make a tool that's useful beyond portage and gentoo and conforms to standards, but that's overkill.  There are bi-endian systems, you can have both hard+soft float via library isolation, there are dozens of abis not used in practice and not support in gentoo, etc.  So we need to draw a line, and simply leave ourselves open to adding more later if we have to.  If we ever had to add hard/soft we could do that later by just expanding NEEDED.ELF.2 and make sure we don't code portage so we can't expand it later.

BTW, minor typo.  Its x86_x32 not x86_x64.  Which despite the name x32 *is* a 64-bit abi :P
Comment 15 Zac Medico gentoo-dev 2015-01-05 17:02:45 UTC
(In reply to Anthony Basile from comment #14)
> Well that's what I'm debating in my head.  My gut reaction was to make a
> tool that's useful beyond portage and gentoo and conforms to standards, but
> that's overkill.  There are bi-endian systems, you can have both hard+soft
> float via library isolation, there are dozens of abis not used in practice
> and not support in gentoo, etc.  So we need to draw a line, and simply leave
> ourselves open to adding more later if we have to.  If we ever had to add
> hard/soft we could do that later by just expanding NEEDED.ELF.2 and make
> sure we don't code portage so we can't expand it later.

Yeah, my intention is for the supported set of abstract identifiers to be extensible. So, if somebody wants a bi-endian or hard+soft float system, we can easily add some new abstract identifiers so that all of their multilib ABIs are distinguishable. If they've built their system before patching portage, then they'll have to re-generate their NEEDED.ELF.2 files after patching portage, so that it contains the new identifiers.

> BTW, minor typo.  Its x86_x32 not x86_x64.  Which despite the name x32 *is*
> a 64-bit abi :P

Right, thanks.
Comment 16 Joshua Kinard gentoo-dev 2015-01-06 00:31:06 UTC
(In reply to Anthony Basile from comment #9)
> (In reply to Anthony Basile from comment #8)
> > (In reply to Zac Medico from comment #5)
> > > @mips: Are there any mips ABIs other that n32, n64, and o32 that are
> > > relevant to Gentoo? If so, then we should add them to the list shown in
> > > comment #4.
> > 
> > There is o64 but no one really uses it. See
> > 
> > https://gcc.gnu.org/projects/mipso64-abi.html
> 
> Sorry you do have that in the list.  To answer your question more directly,
> we are not supporting o64 in gentoo.

o64 doesn't really exist anyways.  GCC/binutils might contain some code for emitting insns in this ABI, but I doubt glibc has all of the needed userland bits for using it.

It's primarily used for kernel compiles of specific SGI systems (IP22, IP32), by inserting 64-bit code in a type of 32-bit wrapper to make the ARCS PROM happy when booting the kernel from disk or over the network.  Those systems are technically only capable of booting 32-bit code, because the ARCS PROM is 32-bits, but the hardware is fully 64-bit capable.
Comment 17 Anthony Basile gentoo-dev 2015-01-06 22:33:56 UTC
Created attachment 393378 [details]
c utility to obtain the abstract abi name for gentoo

This is roughly the c version of the python elf-multilib-id.py.  The only differences are: 1) I'm distinguishing between arm's oabi and eabi (which I'm not sure is necessary).  2) I assume the only abi's we're interested in are the one's in gentoo, so my tests are not strict, they just eliminate the "other" possibilities.

I tested these on almost all our tarballs.  I tested natively on amd64, ppc, ppc64, arm and mips (big endian) and mips64 (little endian) and I tested all our tarballs running the utility on amd64 but against a chroot of all the other arches.  read() gets the byte order right when you read a little endian file on a little endian system, but gets it wrong when you "cross", so you have to compensate for that.
Comment 18 Matt Turner gentoo-dev 2015-01-07 04:17:46 UTC
(In reply to Anthony Basile from comment #17)
> Created attachment 393378 [details]
> c utility to obtain the abstract abi name for gentoo
>
> /* mips: We support o32, n32 and n64.  The first is 32-bits and the
>  * latter two are 64-bit ABIs.
>  */    
> case EM_MIPS:
>       if (width == 64)
>               return "mips_n64";
>       else
>               if (e_flags & EF_MIPS_ABI2)
>                       return "mips_n32";
>               else
>                       return "mips_o32";

The comment doesn't match the code. I'm not sure if the code is incorrect, or if the comment is just confusing in the context.
Comment 19 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-01-07 04:51:36 UTC
Comment on attachment 393378 [details]
c utility to obtain the abstract abi name for gentoo

>int
>get_endian(uint8_t ei_data)
>{
>	switch (ei_data) {
>		case ELFDATA2LSB:
>			return 0;
>		case ELFDATA2MSB:
>			return 1;

Kinda sucky to store endianness as opaque 0/1. Use enum instead, or name the function like is_little/big_endian().

>		default:
>			errx(1, "Unknown endian.");
>	}
>}

>/* Sublte point about read():  If you run elf-abi from a little endian machine on an	*/
>/* Elf object on a big endian (eg. if you are cross compiling), then you get the wrong	*/
>/* byte order.  If howerver, you read it natively, you get it right.  We'll wrap read()	*/
>/* with our own version which reads one byte at a time and corrects this.		*/

It's not really about read(), is it? Because the comment makes me think there's some magic in read() when running ELF files :P.
Comment 20 Zac Medico gentoo-dev 2015-01-07 08:57:50 UTC
(In reply to Michał Górny from comment #19)
> It's not really about read(), is it? Because the comment makes me think
> there's some magic in read() when running ELF files :P.

Yeah, the magic wouldn't be in read(). It would be in the interpreted value of the uint64_t variable that it stored the data in. If the the value is encoded using the native endian, the uint64_t value is interpreted correctly, and otherwise it's not.
Comment 21 Anthony Basile gentoo-dev 2015-01-07 12:11:09 UTC
(In reply to Zac Medico from comment #20)
> (In reply to Michał Górny from comment #19)
> > It's not really about read(), is it? Because the comment makes me think
> > there's some magic in read() when running ELF files :P.
> 
> Yeah, the magic wouldn't be in read(). It would be in the interpreted value
> of the uint64_t variable that it stored the data in. If the the value is
> encoded using the native endian, the uint64_t value is interpreted
> correctly, and otherwise it's not.

read is doing the same thing in both cases, so maybe the better way to say it is as Zac does.  I'll change that comment.
Comment 22 Anthony Basile gentoo-dev 2015-01-07 12:17:51 UTC
(In reply to Matt Turner from comment #18)
> (In reply to Anthony Basile from comment #17)
> > Created attachment 393378 [details]
> > c utility to obtain the abstract abi name for gentoo
> >
> > /* mips: We support o32, n32 and n64.  The first is 32-bits and the
> >  * latter two are 64-bit ABIs.
> >  */    
> > case EM_MIPS:
> >       if (width == 64)
> >               return "mips_n64";
> >       else
> >               if (e_flags & EF_MIPS_ABI2)
> >                       return "mips_n32";
> >               else
> >                       return "mips_o32";
> 
> The comment doesn't match the code. I'm not sure if the code is incorrect,
> or if the comment is just confusing in the context.

The code is correct.  n32 has ELFCLASS32 but runs on a 64-bit harware.  I'll think about a better way to say this.

I did test the code natively on my lemote yeeloong (mips64el) on /lib{,32,64}/libc.so.6  and on my atheros ar7161 (mips32r2) on /lib/libc.so.6.  I also ran it on amd64 but on chroots for mips64 and mips64el all three abis.  It worked.
Comment 23 Anthony Basile gentoo-dev 2015-01-07 12:19:27 UTC
(In reply to Michał Górny from comment #19)
> Comment on attachment 393378 [details]
> c utility to obtain the abstract abi name for gentoo
> 
> >int
> >get_endian(uint8_t ei_data)
> >{
> >	switch (ei_data) {
> >		case ELFDATA2LSB:
> >			return 0;
> >		case ELFDATA2MSB:
> >			return 1;
> 

Well then just leave it as ELFDATA2{LSB,MSB}.  I don't know, minor point but I can clean that up.
Comment 24 Joshua Kinard gentoo-dev 2015-01-08 01:44:47 UTC
>> int
>> get_endian(uint8_t ei_data)
>> {
>> 	switch (ei_data) {
>> 		case ELFDATA2LSB:
>> 			return 0;
>> 		case ELFDATA2MSB:
>> 			return 1;
>
> Kinda sucky to store endianness as opaque 0/1. Use enum instead, or name the
> function like is_little/big_endian().

Yeah, there are platforms that are bi-endian capable, and even weird stuff like mid-endian.  We haven't run into it before, but if it takes just a few extra lines as an enum to future-proof the code, why not?
Comment 25 Arfrever Frehtes Taifersar Arahesis 2015-01-09 17:34:26 UTC
(In reply to Joshua Kinard from comment #24)
> >> int
> >> get_endian(uint8_t ei_data)
> >> {
> >> 	switch (ei_data) {
> >> 		case ELFDATA2LSB:
> >> 			return 0;
> >> 		case ELFDATA2MSB:
> >> 			return 1;
> >
> > Kinda sucky to store endianness as opaque 0/1. Use enum instead, or name the
> > function like is_little/big_endian().
> 
> Yeah, there are platforms that are bi-endian capable, and even weird stuff
> like mid-endian.  We haven't run into it before, but if it takes just a few
> extra lines as an enum to future-proof the code, why not?

=dev-lang/python-2* has a patch (23_all_arm_OABI.patch), which adds support for ARM mixed-endian architecture (bug #266703)...
Comment 26 Zac Medico gentoo-dev 2015-01-27 03:26:10 UTC
I've posted a patch for review that includes support for the discussed NEEDED.ELF.2 extension:

http://thread.gmane.org/gmane.linux.gentoo.portage.devel/5154
Comment 27 Anthony Basile gentoo-dev 2015-01-29 11:56:59 UTC
(In reply to Zac Medico from comment #26)
> I've posted a patch for review that includes support for the discussed
> NEEDED.ELF.2 extension:
> 
> http://thread.gmane.org/gmane.linux.gentoo.portage.devel/5154

I'll respond on the list.  So far it looks good, but I want to do some perf testing.  Its probably okay to commit to the repo at this point though.
Comment 28 Zac Medico gentoo-dev 2015-02-13 18:59:31 UTC
(In reply to Zac Medico from comment #26)
> I've posted a patch for review that includes support for the discussed
> NEEDED.ELF.2 extension:
> 
> http://thread.gmane.org/gmane.linux.gentoo.portage.devel/5154

This is in the master branch now:

https://github.com/gentoo/portage/commit/f1c1b8a77eebf7713b32e5f9945690f60f4f46de
Comment 29 SpanKY gentoo-dev 2015-06-02 18:02:53 UTC
Comment on attachment 393378 [details]
c utility to obtain the abstract abi name for gentoo

this looks like premature optimization to me.  it's also doesn't support most of the things this code must -- biendian and different bitsizes.  this is why pax-utils is written the way it is.

so unless you have data to show that the current python code is slow (and can't be improved), and that the C version is worth the effort, i'd say punt this.

the portage code really should be using pyelftools instead of re-implementing everything itself from scratch as well.
Comment 30 Anthony Basile gentoo-dev 2015-06-04 11:26:08 UTC
(In reply to SpanKY from comment #29)
> Comment on attachment 393378 [details]
> c utility to obtain the abstract abi name for gentoo
> 
> this looks like premature optimization to me.  

I did measure.  The c code is significantly faster than the python, but since the python is only called on a few files during the emerge process, it did not add up to a significant difference overall.  This makes it different than the install-xattr situation which is called on nearly every file belonging to the package and there the c over the python was critical.  

> it's also doesn't support
> most of the things this code must -- biendian and different bitsizes.  this
> is why pax-utils is written the way it is.

i gave up on it because of the above.  i know there are lots of assumption in it, like that we're not dealing with biendian systems.  (btw what do you mean by bitsize?). i just translated zac's python over to c.  can you take a look at that to make sure we're not missing anything?

> 
> so unless you have data to show that the current python code is slow (and
> can't be improved), and that the C version is worth the effort, i'd say punt
> this.
> 
> the portage code really should be using pyelftools instead of
> re-implementing everything itself from scratch as well.

i'm a fan of pyelftools and it has a lot of eyes looking over it.  but do we want to pull it in as a dependency for portage?
Comment 31 SpanKY gentoo-dev 2015-06-04 15:00:28 UTC
(In reply to Anthony Basile from comment #30)

i'm not talking about microbenchmarks ... those are largely irrelevant.  if this code is used 1% of the time, speeding it up 50% is a waste of time considering it's significantly harder to maintain.

wrt bitsize, i meant the comment in main is grossly misleading/wrong -- compare it to `man 5 elf` and its description of Ehdr.  but you seek to hardcoded offsets based on the size, so it does read the right values.  it's just all a bad idea.  if this is a straight port of the python, that worries me more ;).

wrt pyelftools, eventually we're going to do it.  portage already hard depends on pax-utils and that pulls in pyelftools when USE=python.  we might as well admit the reality and stop wasting time on NIH code :).
Comment 32 Zac Medico gentoo-dev 2015-06-04 17:35:39 UTC
This is fixed since portage-2.2.18.