Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 538268 - sci-chemistry/gromacs: figure out avx* -> cpu_flags_x86
Summary: sci-chemistry/gromacs: figure out avx* -> cpu_flags_x86
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Christoph Junghans (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-30 17:51 UTC by Michał Górny
Modified: 2015-02-01 00:36 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-01-30 17:51:03 UTC
Any suggestions what to do with:

	<flag name="avx128fma">Enable 128bit avx with fma (e.g. AMD BullDozer)</flag>
	<flag name="avx_128_fma">Enable 128bit avx with fma (e.g. AMD BullDozer)</flag>
	<flag name="avx256">Enable 256bit avx (e.g. Intel Sandy Bridge)</flag>
	<flag name="avx_256">Enable 256bit avx (e.g. Intel Sandy Bridge)</flag>
	<flag name="avx2_256">Enable 256bit avx2 (e.g. Intel Haswell)</flag>

considering the common thing is to have just (avx, avx2, fma[34]).
Comment 1 Christoph Junghans (RETIRED) gentoo-dev 2015-01-31 20:24:30 UTC
+  31 Jan 2015; Christoph Junghans <ottxor@gentoo.org> gromacs-4.6.5.ebuild,
+  gromacs-4.6.6.ebuild, gromacs-4.6.7.ebuild, gromacs-5.0.1.ebuild,
+  gromacs-5.0.2-r1.ebuild, gromacs-5.0.2.ebuild, gromacs-5.0.4.ebuild,
+  gromacs-5.0.ebuild, metadata.xml:
+  Switch to CPU_FLAGS_X86 (bug #538268)
+

Science overlay:
+  31 Jan 2015; Christoph Junghans <ottxor@gentoo.org> gromacs-4.6.9999.ebuild,
+  gromacs-5.0.9999.ebuild, gromacs-9999.ebuild, metadata.xml:
+  Switch to CPU_FLAGS_X86 (bug #538268)
+
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-01-31 21:12:21 UTC
Thanks but are you sure that this is correct?

    use cpu_flags_x86_sse2 && acce="SSE2"
    use cpu_flags_x86_sse4_1 && acce="SSE4.1"
    use cpu_flags_x86_fma4 && acce="AVX_128_FMA"
    use cpu_flags_x86_avx && acce="AVX_256"
    use cpu_flags_x86_avx2 && acce="AVX2_256"

Unless I'm mistaken, AVX_128_FMA requires both avx & fma. In fact, the AVX_128_FMA method code even uses 256-bit AVX instructions for some other operations...

In fact, the upstream detection code uses another logic, in pseudo-code:

  if (intel)
  {
    avx2 -> AVX2_256
    avx -> AVX_256
    sse4_1 -> SSE4.1
    sse2 -> SSE2
  }
  else if (amd)
  {
    avx -> AVX_128_FMA
    sse4_1 -> SSE4.1
    sse2 -> SSE2
  }

I don't know the details but this suggests 128-bit AVX+FMA is faster on AMD while pure AVX is faster on Intel. I have no clue about AVX2 there. Since FMA4 is AMD-specific, I guess you could change your code to:

    use cpu_flags_x86_sse2 && acce="SSE2"
    use cpu_flags_x86_sse4_1 && acce="SSE4.1"
    if cpu_flags_x86_avx; then
      acce="AVX_256"
      use cpu_flags_x86_fma4 && acce="AVX_128_FMA"
    fi
    use cpu_flags_x86_avx2 && acce="AVX2_256"
Comment 3 Christoph Junghans (RETIRED) gentoo-dev 2015-01-31 22:15:47 UTC
(In reply to Michał Górny from comment #2)
> Thanks but are you sure that this is correct?
> 
>     use cpu_flags_x86_sse2 && acce="SSE2"
>     use cpu_flags_x86_sse4_1 && acce="SSE4.1"
>     use cpu_flags_x86_fma4 && acce="AVX_128_FMA"
>     use cpu_flags_x86_avx && acce="AVX_256"
>     use cpu_flags_x86_avx2 && acce="AVX2_256"
> 
> Unless I'm mistaken, AVX_128_FMA requires both avx & fma. In fact, the
> AVX_128_FMA method code even uses 256-bit AVX instructions for some other
> operations...
> 
> In fact, the upstream detection code uses another logic, in pseudo-code:
> 
>   if (intel)
>   {
>     avx2 -> AVX2_256
>     avx -> AVX_256
>     sse4_1 -> SSE4.1
>     sse2 -> SSE2
>   }
>   else if (amd)
>   {
>     avx -> AVX_128_FMA
>     sse4_1 -> SSE4.1
>     sse2 -> SSE2
>   }
> 
> I don't know the details but this suggests 128-bit AVX+FMA is faster on AMD
> while pure AVX is faster on Intel. I have no clue about AVX2 there. Since
> FMA4 is AMD-specific, I guess you could change your code to:
> 
>     use cpu_flags_x86_sse2 && acce="SSE2"
>     use cpu_flags_x86_sse4_1 && acce="SSE4.1"
>     if cpu_flags_x86_avx; then
>       acce="AVX_256"
>       use cpu_flags_x86_fma4 && acce="AVX_128_FMA"
>     fi
>     use cpu_flags_x86_avx2 && acce="AVX2_256"
Looking at <https://github.com/gromacs/gromacs/blob/master/src/gromacs/gmxlib/gmx_cpuid.c#L1227> it seems to me that for AMD cpus pure AVX kernels will never be suggested. Isn't FMA4 a super set of AVX? I think we basically need to move the "use cpu_flags_x86_fma4 && acce="AVX_128_FMA" line below the avx lines.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-01-31 23:24:40 UTC
To be honest, I don't know about the supersets. I wouldn't trust that, esp. considering that we have VIA processors too. If you want to rely on something, add it to REQUIRED_USE :).
Comment 5 Christoph Junghans (RETIRED) gentoo-dev 2015-02-01 00:36:29 UTC
I am not sure if the kernels will work on VIA processors at all.

@alexxy: do you have any of these fma4 cores to dis-/confirm that the kernel work?

Right now I think we should have fma4 overwrite the avx kernels and add REQUIRED_USE="fma4? ( avx )"