Compile an pure c++ source with hipcc: ```c++ #include <iostream> #include <algorithm> #include <vector> int main() { const size_t count = 4; std::vector<_Float16> f16(count); std::vector<float> f32(count); for(size_t i = 0 ; i < count; ++i) f16[i] = static_cast<_Float16>(0.123) * static_cast<_Float16>(i); std::copy_n(f16.begin(), count, f32.begin()); for(size_t i = 0; i < count; ++i) { std::cout << "loop f16=" << static_cast<float>(f16[i]) << " f32=" << f32[i] << std::endl; } _Float16 one_f16 = 0.123; float one_f32 = one_f16; std::cout << "one f32=" << one_f32 << std::endl; return 0; } ``` and the program gives wrong result, if not specifying -march or -O3. specifying -march=native on Zen3 and Zen4 machine, or specify -O3, does not have this issue. And this issue can be solved by a fedora patch: https://src.fedoraproject.org/fork/tstellar/rpms/compiler-rt/raw/0459cbc5d9eb15f1ad51d74707b4988049183708/f/0001-compiler-rt-Fix-FLOAT16-feature-detection.patch According to https://github.com/ROCmSoftwarePlatform/rocFFT/issues/439#issuecomment-1714779352, it seems that llvm upstream does not consider merge fedora's patch. We have to apply it in Gentoo in order to resolve the bug. Reproducible: Always Steps to Reproduce: 1. emerge =dev-util/hip-5.7.1 2. hipcc float16.cpp -o float16 3. ./float16
Same issue also reported in: [1] https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1350 [2] https://github.com/gentoo/gentoo/pull/33400#issuecomment-1771559726
Created attachment 873194 [details, diff] patch used by fedora
The patch looks OK to pull in, but I'm wondering if tstellar tried to submit it somewhere or not. Could you ask him?
(In reply to Sam James from comment #3) > The patch looks OK to pull in, but I'm wondering if tstellar tried to submit > it somewhere or not. Could you ask him? https://github.com/ROCmSoftwarePlatform/rocFFT/issues/439#issuecomment-1714779352 About one months ago, I've asked, and he does not intend to upstream the patch. Though I does not understand why it's not an upstream bug, I either does not understand the details of the bug, so I believed his/her judge.
Ah, thanks. I suspect it wouldn't be rejected, but rather finding a reviewer is difficult. But it's not a blocker for us pulling it in IMO.
(In reply to Sam James from comment #5) > Ah, thanks. I suspect it wouldn't be rejected, but rather finding a reviewer > is difficult. But it's not a blocker for us pulling it in IMO. Maybe, I think it it should be raised, and let upstream decide if the patch should be merged. I have checked the patch. Basically it's an issue of whether compile llvm-project as a whole or compile each component separately. If compiled as a whole, the CMAKE_TRY_COMPILE_TARGET_TYPE is correctly set to STATIC and there's no issue. I'm not sure why upstream make CMAKE_TRY_COMPILE_TARGET_TYPE non STATIC for non-whole-llvm compilation, maybe there's a reason
https://github.com/llvm/llvm-project/pull/69842 is still in reviewing, while Mgorny already approved it, and it is currently clear that at least fix this issue in Gentoo. Shall we pull it into our patch set first? Currently >=compiler-rt-16 needs this.
(In reply to Yiyang Wu from comment #7) > https://github.com/llvm/llvm-project/pull/69842 is still in reviewing, while > Mgorny already approved it, and it is currently clear that at least fix this > issue in Gentoo. Shall we pull it into our patch set first? > > Currently >=compiler-rt-16 needs this. compiler-rt-16 tests fail with it which made us feel unsure about what to do with it even for 17. Please investigate.
Created attachment 875146 [details] compiler-rt-17.0.4 build and test
Created attachment 875147 [details] compiler-rt-16.0.6 build and test
(In reply to Sam James from comment #8) > > compiler-rt-16 tests fail with it which made us feel unsure about what to do > with it even for 17. Please investigate. On my system 16.0.6 failed three tests and 17.0.4 is fine with the patch: Failed Tests (2): Builtins-x86_64-linux :: extendhfsf2_test.c Builtins-x86_64-linux :: truncdfhf2_test.c Is this the same failure you encountered? I will take a look at this.
I pick up extendhfsf2_test to take a look: error in test__extendhfsf2(0x7e00) = 0.000000, expected nan while in compiler-rt-17.0.4, the test past with consistent value nan.
(In reply to Yiyang Wu from comment #11) > (In reply to Sam James from comment #8) > > > > compiler-rt-16 tests fail with it which made us feel unsure about what to do > > with it even for 17. Please investigate. > > On my system 16.0.6 failed three tests and 17.0.4 is fine with the patch: > > Failed Tests (2): > > Builtins-x86_64-linux :: extendhfsf2_test.c > > Builtins-x86_64-linux :: truncdfhf2_test.c > > Is this the same failure you encountered? I will take a look at this. Yeah, that's it. Thanks!
I figured out what cause the different behavior between compiler-rt-16 and 17: The executed commands to compile truncdfhf2_test.c are ``` /opt/gentoo/usr/lib/llvm/16/bin/clang -gline-tables-only -m64 -fno-builtin -I /run/user/18014/portage/sys-libs/compiler-rt-16.0.6-r1/work/compiler-rt/lib/builtins -nodefaultlibs /run/user/18014/portage/sys-libs/compiler-rt-16.0.6-r1/work/compiler-rt/test/builtins/Unit/extendhfsf2_test.c /run/user/18014/portage/sys-libs/compiler-rt-16.0.6-r1/work/compiler-rt-16.0.6_build/lib/linux/libclang_rt.builtins-x86_64.a -lc -lm -o /run/user/18014/portage/sys-libs/compiler-rt-16.0.6-r1/work/compiler-rt-16.0.6_build/test/builtins/Unit/X86_64LinuxConfig/Output/extendhfsf2_test.c.tmp.2 /opt/gentoo/usr/lib/llvm/17/bin/clang -gline-tables-only -m64 -DCOMPILER_RT_HAS_FLOAT16 -fno-builtin -I /run/user/18014/portage/sys-libs/compiler-rt-17.0.4-r1/work/compiler-rt/lib/builtins -nodefaultlibs /run/user/18014/portage/sys-libs/compiler-rt-17.0.4-r1/work/compiler-rt/test/builtins/Unit/truncdfhf2_test.c /run/user/18014/portage/sys-libs/compiler-rt-17.0.4-r1/work/compiler-rt-17.0.4_build/lib/linux/libclang_rt.builtins-x86_64.a -lc -lm -o /run/user/18014/portage/sys-libs/compiler-rt-17.0.4-r1/work/compiler-rt-17.0.4_build/test/builtins/Unit/X86_64LinuxConfig/Output/truncdfhf2_test.c.tmp ``` The error is caused by missing `-DCOMPILER_RT_HAS_FLOAT16` when testing compiler-rt-16. With the patch, and with `-DCOMPILER_RT_HAS_FLOAT16` manually added, the error disappears. = Interestingly, without that patch, but manually adding `-DCOMPILER_RT_HAS_FLOAT16`, the error comes out. And this is also true for compiler-rt-17. So for both 16 and 17: no patch + no -DCOMPILER_RT_HAS_FLOAT16 = pass patch + -DCOMPILER_RT_HAS_FLOAT16 = pass patch + no -DCOMPILER_RT_HAS_FLOAT16 = fail no patch + -DCOMPILER_RT_HAS_FLOAT16 = fail However the test suite automatically add `-DCOMPILER_RT_HAS_FLOAT16` if the patch is applied, and that's why it pass the test
(In reply to Yiyang Wu from comment #14) > > So for both 16 and 17: > > no patch + no -DCOMPILER_RT_HAS_FLOAT16 = pass > patch + -DCOMPILER_RT_HAS_FLOAT16 = pass > patch + no -DCOMPILER_RT_HAS_FLOAT16 = fail > no patch + -DCOMPILER_RT_HAS_FLOAT16 = fail > > However the test suite automatically add `-DCOMPILER_RT_HAS_FLOAT16` if the > patch is applied, and that's why it pass the test And I confirmed that https://github.com/llvm/llvm-project/commit/f2bb57c19455e77fd71be782f9ee9092f31bd2a9 fix the broken test of compiler-rt-16, by automatically append `-DCOMPILER_RT_HAS_FLOAT16` when building truncdfhf2_test.c, which makes 16 and 17 behaves the same. So my suggestion is, the patch is OK, and it's only one problem of testing in compiler-rt-16 which is fixed by https://github.com/llvm/llvm-project/commit/f2bb57c19455e77fd71be782f9ee9092f31bd2a9 in compiler-rt-17.
Thank you! Let's add it then.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=866069267e1b71a27b1dc1d718180c731696fd92 commit 866069267e1b71a27b1dc1d718180c731696fd92 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2023-11-24 20:18:33 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2023-11-24 20:19:14 +0000 sys-libs/compiler-rt: Add FP16 fixes Closes: https://bugs.gentoo.org/916069 Signed-off-by: Michał Górny <mgorny@gentoo.org> sys-libs/compiler-rt/Manifest | 2 + ...-17.0.5.ebuild => compiler-rt-16.0.6-r4.ebuild} | 1 + sys-libs/compiler-rt/compiler-rt-17.0.5-r1.ebuild | 178 +++++++++++++++++++++ 3 files changed, 181 insertions(+)