Summary: | Stockfish ebuild interprets CPU_FLAGS_X86 wrongly. | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | chromatix99 |
Component: | Current packages | Assignee: | Gentoo Linux bug wranglers <bug-wranglers> |
Status: | RESOLVED NEEDINFO | ||
Severity: | normal | ||
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Draft patch for Stockfish-17 ebuild. |
Please update the bug summary with the full package category, name, and version. |
Created attachment 907823 [details, diff] Draft patch for Stockfish-17 ebuild. On AMD64 machines not new enough to support SSE4.1 and POPCNT, Stockfish-17 (and several previous versions) fail to build a runnable binary. This is because the ebuild incorrectly translates the information in CPU_FLAGS_X86 into a Stockfish build configuration ID. The current logic only works for newer machines. In particular, an AMD Bobcat with the appropriate CPU_FLAGS_X86 (including the popcnt and sse flags) triggers the x86-64-modern build. This produces a binary incapable of running on the Bobcat core, since it is equivalent to x86-64-sse41-popcnt. The most appropriate build for this hardware would be x86-64-ssse3. In addition, the x86-64-modern configuration is deprecated as of Stockfish-17, and triggers a warning message during the build. Attached is my spitball attempt to improve matters, along the following lines: The x86-64-sse41-popcnt configuration is named explicitly, and triggered by an explicit check of both the sse4_1 and popcnt flags. The x86-64-ssse3 configuration is triggered by the ssse3 flag. It is about 50% faster than sse3-popcnt on Bobcat. The x86-64-sse3-popcnt configuration is triggered by the combination of sse3 and popcnt flags. It is faster than a plain x86-64 build. The above are listed from higher to lower priority, thus the reverse order to how they appear in the ebuild. The x86-64-bmi2 configuration is not appropriate for all machines supporting AVX2, but there is no CPU_FLAGS_X86 flag specific to BMI2. I've therefore made the distinction between the x86-64-avx2 and x86-64-bmi2 configurations depend on a fast_bmi2 USE flag. On my Zen3 box, these two are nearly equivalent in speed, and both are significantly faster than any other that will run at all. It would obviously be preferable to have a bmi2 flag in CPU_FLAGS_X86.