Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 371991 - The sse4 USE flag is broken in media-libs/freeverb3
Summary: The sse4 USE flag is broken in media-libs/freeverb3
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Sebastian Pipping
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-17 06:13 UTC by Mike Nerone
Modified: 2015-02-01 23:30 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Nerone 2011-06-17 06:13:04 UTC
The situation I just noticed is pretty hairy, but I'll try to summarize it. The problem stems from confusion about what "SSE4" means:

* The first iteration of "SSE4", called SSE4.1 was release by Intel as part of the "Core 2" CPU series.
* Intel added a few more instructions, called SSE4.2 in the "Core i3/i5/i7]", em, series.
* In proper usage (i.e. Intel's definition), "SSE4" means *both* SSE4.1 and SSE4.2.
* AMD introduced "SSE4a", which is none of the above (a few instructions overlap, but they are not compatible sets).

Important observations:

* There are many processors (the whole Core 2 line) that support SSE4.1 but not SSE4.2.
* Intel doesn't support SSE4a, and AMD doesn't support SSE4 (i.e. SSE4.1 & SSE4.2); therefore, no processor exists that supports both.

This is explained in more detail on Wikipedia [1].

Now on to the sse4 USE flag. It is used in (thankfully) only two packages right now: media-libs/freeverb3 and media-libs/spandsp (unstable on that one). Take a look at how they use it:

The freeverb3 ebuild turns on "--enable-sse4". If you "./configure --help", you'll see it documented that their SSE4 support is explicitly SSE4.1. The upstream devs are not helping the confusion by using SSE4 as a synonym for SSE4.1, which it is not. Still, at least SSE4 is a superset of SSE4.1, so at least it wouldn't cause breakage. The Core 2 owners are a point of confusion, though. Since they do have SSE4.1, this would seem to indicate that they should turn on USE="sse4" even though they don't have full SSE4.

If freeverb3 was alone, it would just be confusing (because it's incorrect), but not particularly harmful. However, spandsp-0.0.6_pre12 is another story. With USE="sse4", the ebuild turns on "--enable-sse4a --enable-sse4-1 --enable sse4-2". Not only are these assumptions unsafe, but indeed, no processor even exists that can satisfy them, as I explained.

------

Looking at the Changelog for spandsp, I just noticed that an spandsp-0.0.6_pre12-r1 was made after that, in which support for SSE4 and SSE5 were dropped. The log message says:

    Drop problematic sse4 & sse5 USE-flags, in GCC 4.5 no such options exist.
    Closes bug #356299 by Agostino...

A. I don't think SSE4 is "problematic" once one is aware of what "SSE4" means. Most likely (but I haven't confirmed), the bug is caused by the impossible ./configure parameters I described above.

B. gcc-4.5 most certainly still has SSE4 options [2]. So does gcc-4.6 [3].

-------

So here's my suggestion: Get rid of the SSE4 USE flag. It's ambiguous. Instead, create separate sse4_1, sse4_2, and sse4a USE flags. They really are three separate things. They also have the benefit of matching the feature flags displayed in /proc/cpuinfo, so it will be quite clear to users whether they should turn them on or not.

Understanding that the freeverb3 devs are misusing "--enable-sse4", its ebuild should respect the *correct* sse4_1 USE flag.

spandsp should bring back SSE4 support, and honor each of the three USE flags independently, since the upstream source is capable (arguably, USE="sse4_2 -sse4_1" could be an error condition since SSE4.2 is a superset, but there's probably no compelling reason to enforce that).

[1] http://en.wikipedia.org/wiki/SSE4
[2] http://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc.pdf
[3] http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc.pdf
Comment 1 Mike Nerone 2011-06-17 06:19:31 UTC
Whoops - a clarification: I said at the end that SSE4.2 is a superset of SSE4.1. That's not technically correct - SSE4.1 is the original set of instructions, while SSE4.2 is the set of instructions that were added later, so they are really disjoint sets. However, all CPUs that support SSE4.2 also support SSE4.1. I.e. SSE4.2 implies SSE4.1.
Comment 2 Sebastian Pipping gentoo-dev 2011-06-17 16:43:43 UTC
I'm not objecting to changes but do distinct flags really offer more?  If use flag sse4 turns on any variant of these instractions maybe that's enough?  Can you think of a case where several of these flags would be needed in parallel?  That's the core question to me as of now.
Comment 3 Mike Nerone 2011-06-17 19:05:26 UTC
(In reply to comment #2)
> I'm not objecting to changes but do distinct flags really offer more?  If use
> flag sse4 turns on any variant of these instractions maybe that's enough?  Can
> you think of a case where several of these flags would be needed in parallel? 
> That's the core question to me as of now.

With just the one USE flag, how would you handle a case like spandsp that supports all of the cases:

1. No SSE4 at all.
2. SSE4.1 only.
3. SSE4.1 & SSE4.2
4. SSE4a only.

I don't see how to distinguish those cases without distinct flags. Having the one "sse4" use flag turn on all the variants is exactly what the spandsp ebuild did before, which creates the impossible situation I described (no CPU exists that supports all three variants).
Comment 4 Sebastian Pipping gentoo-dev 2011-06-17 19:35:42 UTC
If that's the case, okay.

So for freeverb3 you want me to rename sse4 to sse4_1 ?
Comment 5 Chí-Thanh Christopher Nguyễn gentoo-dev 2011-06-17 19:54:34 UTC
Do the affected applications attempt to use sse4{,.1,.2,a} even on CPUs that don't support the instruction set, or do they have runtime detection à la mplayer?
Comment 6 Mike Nerone 2011-06-17 19:58:22 UTC
(In reply to comment #4)
> So for freeverb3 you want me to rename sse4 to sse4_1 ?

That's my suggestion, yes, although I'd like someone who know more about this low level stuff to chime in (like flameeyes).

(In reply to comment #5)
> Do the affected applications attempt to use sse4{,.1,.2,a} even on CPUs that
> don't support the instruction set, or do they have runtime detection à la
> mplayer?

I haven't checked, but it seems irrelevant to me, because we can't assume that all future apps would have runtime detection, anyway.
Comment 7 Matt Turner gentoo-dev 2013-02-06 19:05:46 UTC
The USE flags just turn on configure switches that add -msse* to your CFLAGS. This is bogus, we shouldn't have USE flags for this. The users' CFLAGS will turn on SSE if they have it.
Comment 8 Matt Turner gentoo-dev 2013-02-06 23:23:32 UTC
Dropped the MMX/SSE USE flags from spandsp.
Comment 9 Sebastian Pipping gentoo-dev 2013-02-07 20:30:13 UTC
(In reply to comment #7)
> The USE flags just turn on configure switches that add -msse* to your
> CFLAGS. This is bogus, we shouldn't have USE flags for this. The users'
> CFLAGS will turn on SSE if they have it.

The sse4 use flag does two things on freeverb3:

 1. Extend CFLAGS by "-mfpmath=sse -msse -msse2 -msse3 -msse4.1"

 2. Enable hand-written SSE4 code  (grep for ENABLE_SSE4)

So the use flag would remain to have a function, even if I patched the extension of CFLAGS away.  It's in the interest of upstream to keep extending CFLAGS this way.  I am unsure if patching that away is worth the trouble before we get actual bug reports with compile errors on those.  What do you think?
Comment 10 Sebastian Pipping gentoo-dev 2013-02-07 20:31:21 UTC
(In reply to comment #9)
>  2. Enable hand-written SSE4 code  (grep for ENABLE_SSE4)

- code
+ assembly code
Comment 11 Matt Turner gentoo-dev 2013-02-07 21:29:52 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > The USE flags just turn on configure switches that add -msse* to your
> > CFLAGS. This is bogus, we shouldn't have USE flags for this. The users'
> > CFLAGS will turn on SSE if they have it.
> 
> The sse4 use flag does two things on freeverb3:
> 
>  1. Extend CFLAGS by "-mfpmath=sse -msse -msse2 -msse3 -msse4.1"
> 
>  2. Enable hand-written SSE4 code  (grep for ENABLE_SSE4)
> 
> So the use flag would remain to have a function, even if I patched the
> extension of CFLAGS away.  It's in the interest of upstream to keep
> extending CFLAGS this way.  I am unsure if patching that away is worth the
> trouble before we get actual bug reports with compile errors on those.  What
> do you think?

Yeah, if it turns on assembly code, we should keep the flag I think. Modifying the CFLAGS in that case is probably okay. I would ask that you name it sse41 (instead of sse4_1 like libvpx does). I'll try to get vpx to rename theirs -- some consistency across the tree would be nice.
Comment 12 Andrew Savchenko gentoo-dev 2013-03-06 18:33:34 UTC
spandsp-0.0.6_pre21 has separate configure flags for all SSE4 flavors: sse41, sse42, sse4a. These flags not only add -msse* to CFLAGS, but also control GCC intrinsics, thus they can't be replaced by CFLAGS alone.

I filed a separate bug 460570 for this.
Comment 13 Sebastian Pipping gentoo-dev 2015-02-01 23:30:38 UTC
With CPU_FLAGS_X86 and line

  $(use_enable cpu_flags_x86_sse4_1 sse4)

in tree now, I believe this is fixed.  Please re-open if needed.