Any suggestions what to do with: <flag name="avx128fma">Enable 128bit avx with fma (e.g. AMD BullDozer)</flag> <flag name="avx_128_fma">Enable 128bit avx with fma (e.g. AMD BullDozer)</flag> <flag name="avx256">Enable 256bit avx (e.g. Intel Sandy Bridge)</flag> <flag name="avx_256">Enable 256bit avx (e.g. Intel Sandy Bridge)</flag> <flag name="avx2_256">Enable 256bit avx2 (e.g. Intel Haswell)</flag> considering the common thing is to have just (avx, avx2, fma[34]).
+ 31 Jan 2015; Christoph Junghans <ottxor@gentoo.org> gromacs-4.6.5.ebuild, + gromacs-4.6.6.ebuild, gromacs-4.6.7.ebuild, gromacs-5.0.1.ebuild, + gromacs-5.0.2-r1.ebuild, gromacs-5.0.2.ebuild, gromacs-5.0.4.ebuild, + gromacs-5.0.ebuild, metadata.xml: + Switch to CPU_FLAGS_X86 (bug #538268) + Science overlay: + 31 Jan 2015; Christoph Junghans <ottxor@gentoo.org> gromacs-4.6.9999.ebuild, + gromacs-5.0.9999.ebuild, gromacs-9999.ebuild, metadata.xml: + Switch to CPU_FLAGS_X86 (bug #538268) +
Thanks but are you sure that this is correct? use cpu_flags_x86_sse2 && acce="SSE2" use cpu_flags_x86_sse4_1 && acce="SSE4.1" use cpu_flags_x86_fma4 && acce="AVX_128_FMA" use cpu_flags_x86_avx && acce="AVX_256" use cpu_flags_x86_avx2 && acce="AVX2_256" Unless I'm mistaken, AVX_128_FMA requires both avx & fma. In fact, the AVX_128_FMA method code even uses 256-bit AVX instructions for some other operations... In fact, the upstream detection code uses another logic, in pseudo-code: if (intel) { avx2 -> AVX2_256 avx -> AVX_256 sse4_1 -> SSE4.1 sse2 -> SSE2 } else if (amd) { avx -> AVX_128_FMA sse4_1 -> SSE4.1 sse2 -> SSE2 } I don't know the details but this suggests 128-bit AVX+FMA is faster on AMD while pure AVX is faster on Intel. I have no clue about AVX2 there. Since FMA4 is AMD-specific, I guess you could change your code to: use cpu_flags_x86_sse2 && acce="SSE2" use cpu_flags_x86_sse4_1 && acce="SSE4.1" if cpu_flags_x86_avx; then acce="AVX_256" use cpu_flags_x86_fma4 && acce="AVX_128_FMA" fi use cpu_flags_x86_avx2 && acce="AVX2_256"
(In reply to Michał Górny from comment #2) > Thanks but are you sure that this is correct? > > use cpu_flags_x86_sse2 && acce="SSE2" > use cpu_flags_x86_sse4_1 && acce="SSE4.1" > use cpu_flags_x86_fma4 && acce="AVX_128_FMA" > use cpu_flags_x86_avx && acce="AVX_256" > use cpu_flags_x86_avx2 && acce="AVX2_256" > > Unless I'm mistaken, AVX_128_FMA requires both avx & fma. In fact, the > AVX_128_FMA method code even uses 256-bit AVX instructions for some other > operations... > > In fact, the upstream detection code uses another logic, in pseudo-code: > > if (intel) > { > avx2 -> AVX2_256 > avx -> AVX_256 > sse4_1 -> SSE4.1 > sse2 -> SSE2 > } > else if (amd) > { > avx -> AVX_128_FMA > sse4_1 -> SSE4.1 > sse2 -> SSE2 > } > > I don't know the details but this suggests 128-bit AVX+FMA is faster on AMD > while pure AVX is faster on Intel. I have no clue about AVX2 there. Since > FMA4 is AMD-specific, I guess you could change your code to: > > use cpu_flags_x86_sse2 && acce="SSE2" > use cpu_flags_x86_sse4_1 && acce="SSE4.1" > if cpu_flags_x86_avx; then > acce="AVX_256" > use cpu_flags_x86_fma4 && acce="AVX_128_FMA" > fi > use cpu_flags_x86_avx2 && acce="AVX2_256" Looking at <https://github.com/gromacs/gromacs/blob/master/src/gromacs/gmxlib/gmx_cpuid.c#L1227> it seems to me that for AMD cpus pure AVX kernels will never be suggested. Isn't FMA4 a super set of AVX? I think we basically need to move the "use cpu_flags_x86_fma4 && acce="AVX_128_FMA" line below the avx lines.
To be honest, I don't know about the supersets. I wouldn't trust that, esp. considering that we have VIA processors too. If you want to rely on something, add it to REQUIRED_USE :).
I am not sure if the kernels will work on VIA processors at all. @alexxy: do you have any of these fma4 cores to dis-/confirm that the kernel work? Right now I think we should have fma4 overwrite the avx kernels and add REQUIRED_USE="fma4? ( avx )"