When building media-video/ffmpeg-5.1.2-r1 for 32 bit ABI on x64 (needed for wine) one of the gcc compile commands hangs with 100% CPU usage on one thread. These are the commands in the ps auxww output: root 2820803 0.0 0.0 8532 2048 pts/7 S+ 16:01 0:00 x86_64-pc-linux-gnu-gcc -m32 -mfpmath=sse -I. -Isrc/ -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DPIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_avcodec -march=native -O3 -pipe -march=znver1 -std=c11 -fPIC -pthread -I/usr/include/lilv-0 -I/usr/include/serd-0 -I/usr/include/sord-0 -I/usr/include/sratom-0 -I/usr/include/freetype2 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/fribidi -I/usr/include/libxml2 -I/usr/include/freetype2 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/bs2b -I/usr/include/libdrm -I/usr/include/freetype2 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/freetype2 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/fribidi -I/usr/include/openh264 -I/usr/include/openjpeg-2.5 -I/usr/include/opus -I/usr/include/opus -D_REENTRANT -I/usr/include/librsvg-2.0 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/lib/libffi/include -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libpng16 -pthread -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/harfbuzz -I/usr/include/pixman-1 -I/usr/include/samba-4.0 -I/usr/include/srt -I/usr/include/leptonica -DX264_API_IMPORTS -I/usr/include/libxml2 -I/usr/include/libdrm -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -march=native -O3 -pipe -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -Wno-maybe-uninitialized -I/usr/include/SDL2 -D_REENTRANT -MMD -MF libavcodec/h264_cabac.d -MT libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o src/libavcodec/h264_cabac.c root 2820805 99.7 0.0 107308 79600 pts/7 R+ 16:01 2:47 /usr/libexec/gcc/x86_64-pc-linux-gnu/12/cc1 -quiet -I . -I src/ -I /usr/include/lilv-0 -I /usr/include/serd-0 -I /usr/include/sord-0 -I /usr/include/sratom-0 -I /usr/include/freetype2 -I /usr/include/harfbuzz -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I /usr/include/fribidi -I /usr/include/libxml2 -I /usr/include/freetype2 -I /usr/include/harfbuzz -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I /usr/include/bs2b -I /usr/include/libdrm -I /usr/include/freetype2 -I /usr/include/harfbuzz -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I /usr/include/freetype2 -I /usr/include/harfbuzz -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I /usr/include/fribidi -I /usr/include/openh264 -I /usr/include/openjpeg-2.5 -I /usr/include/opus -I /usr/include/opus -I /usr/include/librsvg-2.0 -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I /usr/lib/libffi/include -I /usr/include/libmount -I /usr/include/blkid -I /usr/include/gdk-pixbuf-2.0 -I /usr/include/libpng16 -I /usr/include/cairo -I /usr/include/freetype2 -I /usr/include/harfbuzz -I /usr/include/pixman-1 -I /usr/include/samba-4.0 -I /usr/include/srt -I /usr/include/leptonica -I /usr/include/libxml2 -I /usr/include/libdrm -I /usr/include/SDL2 -imultilib 32 -MMD libavcodec/h264_cabac.d -MF libavcodec/h264_cabac.d -MT libavcodec/h264_cabac.o -D_REENTRANT -D _ISOC99_SOURCE -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D _POSIX_C_SOURCE=200112 -D _XOPEN_SOURCE=600 -D PIC -D ZLIB_CONST -D HAVE_AV_CONFIG_H -D BUILDING_avcodec -D _REENTRANT -D X264_API_IMPORTS -D _REENTRANT src/libavcodec/h264_cabac.c -march=znver1 -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -msse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mno-clwb -mclzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mmwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -mno-sgx -msha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1 -quiet -dumpdir libavcodec/ -dumpbase h264_cabac.c -dumpbase-ext .c -m32 -mfpmath=sse -O3 -O3 -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wunused-const-variable=0 -Wno-bool-operation -Wno-char-subscripts -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat=1 -Wno-maybe-uninitialized -std=c11 -fPIC -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -o - Reproducible: Always Steps to Reproduce: Add the USE flag: media-video/ffmpeg abi_x86_32
Please include the full build.log up to the point it hangs and emerge --info.
Created attachment 857399 [details] ffmpeg build log
Created attachment 857401 [details] emerge --info
Created attachment 857403 [details] emerge -pqv media-video/ffmpeg
Note that this is the build log after I kill the gcc process, since it's hanging.
Builds fine if I change -O3 to -O2 in CFLAGS.
Can confirm I also experience this hang with -O3 with gcc-12 and gcc-13 on ffmpeg version 4.4.3. Builds fine with -O2.
My march=native for znver1 expanded out: # for t in param target; do cmd="gcc -Q -O2 --help=$t"; diff -U0 <(LANG=C $cmd) <(LANG=C $cmd -march=native); done --- /dev/fd/63 2023-03-12 18:07:58.606179015 +0000 +++ /dev/fd/62 2023-03-12 18:07:58.607179028 +0000 @@ -21 +21 @@ - --param=avoid-fma-max-bits=<0,512> 0 + --param=avoid-fma-max-bits=<0,512> 128 @@ -234 +234 @@ - --param=simultaneous-prefetches= 6 + --param=simultaneous-prefetches= 100 --- /dev/fd/63 2023-03-12 18:07:58.613179102 +0000 +++ /dev/fd/62 2023-03-12 18:07:58.613179102 +0000 @@ -12 +12 @@ - -mabm [disabled] + -mabm [enabled] @@ -15,2 +15,2 @@ - -madx [disabled] - -maes [disabled] + -madx [enabled] + -maes [enabled] @@ -27 +27 @@ - -march= x86-64 + -march= znver1 @@ -29,2 +29,2 @@ - -mavx [disabled] - -mavx2 [disabled] + -mavx [enabled] + -mavx2 [enabled] @@ -32 +32 @@ - -mavx256-split-unaligned-store [disabled] + -mavx256-split-unaligned-store [enabled] @@ -53,2 +53,2 @@ - -mbmi [disabled] - -mbmi2 [disabled] + -mbmi [enabled] + -mbmi2 [enabled] @@ -60 +60 @@ - -mclflushopt [disabled] + -mclflushopt [enabled] @@ -62 +62 @@ - -mclzero [disabled] + -mclzero [enabled] @@ -65,2 +65,2 @@ - -mcrc32 [disabled] - -mcx16 [disabled] + -mcrc32 [enabled] + -mcx16 [enabled] @@ -71 +71 @@ - -mf16c [disabled] + -mf16c [enabled] @@ -76 +76 @@ - -mfma [disabled] + -mfma [enabled] @@ -82 +82 @@ - -mfsgsbase [disabled] + -mfsgsbase [enabled] @@ -109 +109 @@ - -mlzcnt [disabled] + -mlzcnt [enabled] @@ -115 +115 @@ - -mmovbe [disabled] + -mmovbe [enabled] @@ -122,2 +122,2 @@ - -mmwait [disabled] - -mmwaitx [disabled] + -mmwait [enabled] + -mmwaitx [enabled] @@ -130 +130 @@ - -mno-sse4 [enabled] + -mno-sse4 [disabled] @@ -136 +136 @@ - -mpclmul [disabled] + -mpclmul [enabled] @@ -140 +140 @@ - -mpopcnt [disabled] + -mpopcnt [enabled] @@ -142 +142 @@ - -mprefer-vector-width= none + -mprefer-vector-width= 128 @@ -145 +145 @@ - -mprfchw [disabled] + -mprfchw [enabled] @@ -149,2 +149,2 @@ - -mrdrnd [disabled] - -mrdseed [disabled] + -mrdrnd [enabled] + -mrdseed [enabled] @@ -160 +160 @@ - -msahf [disabled] + -msahf [enabled] @@ -163 +163 @@ - -msha [disabled] + -msha [enabled] @@ -170,5 +170,5 @@ - -msse3 [disabled] - -msse4 [disabled] - -msse4.1 [disabled] - -msse4.2 [disabled] - -msse4a [disabled] + -msse3 [enabled] + -msse4 [enabled] + -msse4.1 [enabled] + -msse4.2 [enabled] + -msse4a [enabled] @@ -177 +177 @@ - -mssse3 [disabled] + -mssse3 [enabled] @@ -192 +192 @@ - -mtune= generic + -mtune= znver1 @@ -205,4 +205,4 @@ - -mxsave [disabled] - -mxsavec [disabled] - -mxsaveopt [disabled] - -mxsaves [disabled] + -mxsave [enabled] + -mxsavec [enabled] + -mxsaveopt [enabled] + -mxsaves [enabled]
Someone hit this on the forums at https://forums.gentoo.org/viewtopic-t-1162301.html too.
Reproduced with CFLAGS="-O3 -march=znver1" CXXFLAGS="-O3 -march=znver1" USE="-* X abi_x86_32 abi_x86_64 alsa amd64 amr amrenc bluray bs2b bzip2 cdio chromium codec2 cpu_flags_x86_aes cpu_flags_x86_avx cpu_flags_x86_avx2 cpu_flags_x86_fma3 cpu_flags_x86_mmx cpu_flags_x86_mmxext cpu_flags_x86_sse cpu_flags_x86_sse2 cpu_flags_x86_sse3 cpu_flags_x86_sse4_1 cpu_flags_x86_sse4_2 cpu_flags_x86_ssse3 dav1d elibc_glibc encode fdk fftools_aviocat fftools_cws2fws fftools_ffescape fftools_ffeval fftools_ffhash fftools_fourcc2pixfmt fftools_graph2dot fftools_ismindex fftools_pktdumper fftools_qt-faststart fftools_sidxindex fftools_trasher flite fontconfig frei0r fribidi gme gmp gnutls gpl gsm hardcoded-tables iconv iec61883 ieee1394 jack jpeg2k kernel_linux kvazaar ladspa lcms libaom libaribb24 libass libcaca libdrm libilbc librtmp libsoxr libtesseract libv4l libxml2 lv2 lzma modplug mp3 network openal opencl opengl openh264 opus oss postproc pulseaudio rubberband samba sdl snappy speex srt ssh svg theora threads truetype twolame userland_GNU v4l vaapi vdpau vidstab vorbis vpx vulkan webp x264 x265 xvid zeromq zimg zlib zvbi" ebuild ffmpeg-5.1.2-r1.ebuild clean compile
(In reply to Sam James from comment #10) > Reproduced with CFLAGS="-O3 -march=znver1" CXXFLAGS="-O3 -march=znver1" > USE="-* X abi_x86_32 abi_x86_64 alsa amd64 amr amrenc bluray bs2b bzip2 cdio > chromium codec2 cpu_flags_x86_aes cpu_flags_x86_avx cpu_flags_x86_avx2 > cpu_flags_x86_fma3 cpu_flags_x86_mmx cpu_flags_x86_mmxext cpu_flags_x86_sse > cpu_flags_x86_sse2 cpu_flags_x86_sse3 cpu_flags_x86_sse4_1 > cpu_flags_x86_sse4_2 cpu_flags_x86_ssse3 dav1d elibc_glibc encode fdk > fftools_aviocat fftools_cws2fws fftools_ffescape fftools_ffeval > fftools_ffhash fftools_fourcc2pixfmt fftools_graph2dot fftools_ismindex > fftools_pktdumper fftools_qt-faststart fftools_sidxindex fftools_trasher > flite fontconfig frei0r fribidi gme gmp gnutls gpl gsm hardcoded-tables > iconv iec61883 ieee1394 jack jpeg2k kernel_linux kvazaar ladspa lcms libaom > libaribb24 libass libcaca libdrm libilbc librtmp libsoxr libtesseract libv4l > libxml2 lv2 lzma modplug mp3 network openal opencl opengl openh264 opus oss > postproc pulseaudio rubberband samba sdl snappy speex srt ssh svg theora > threads truetype twolame userland_GNU v4l vaapi vdpau vidstab vorbis vpx > vulkan webp x264 x265 xvid zeromq zimg zlib zvbi" ebuild > ffmpeg-5.1.2-r1.ebuild clean compile USE="encode libxml2 opus jpeg2k samba svg srt libtesseract opengl sdl x264 x265 openh264" CFLAGS="-O3 -march=znver1" CXXFLAGS="-O3 -march=znver1" isn't enough to reproduce tho.
Just to add I do not get a hang during build with version 12.2.1_p20230121-r1
Same problem here with CFLAGS="-march=native -O3 -pipe -Wno-narrowing -fno-stack-check" CXXFLAGS="-fpermissive ${CFLAGS}" USE="X alsa amr amrenc bluray bs2b bzip2 cdio chromaprint chromium codec2 cpudetection dav1d doc encode fdk fontconfig frei0r fribidi gme gmp gnutls gpl gsm hardcoded-tables iconv iec61883 ieee1394 jpeg2k kvazaar ladspa libaom libaribb24 libass libcaca libdrm libilbc librtmp libsoxr libtesseract libv4l libxml2 lv2 lzma modplug mp3 network openal opencl opengl openh264 openssl opus postproc pulseaudio rav1e rubberband samba sdl snappy speex ssh svg theora threads truetype twolame v4l vaapi vdpau vidstab vorbis vpx vulkan webp x264 x265 xvid zeromq zimg zlib zvbi -amf (-appkit) -cuda -debug -flite -gcrypt -jack (-mipsdspr1) (-mipsdspr2) (-mipsfpu) (-mmal) -nvenc -oss -pic -sndio -srt -static-libs -svt-av1 -test -verify-sig -vmaf" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext sse sse2 sse3 sse4_1 sse4_2 ssse3 -3dnow -3dnowext -fma4 -xop" FFTOOLS="aviocat cws2fws ffescape ffeval ffhash fourcc2pixfmt graph2dot ismindex pktdumper qt-faststart sidxindex trasher"
Minimal flags to reproduce build hang: CPU_FLAGS_X86="" ABI_X86="32 64" USE="-*" ebuild ffmpeg-4.4.3.ebuild clean compile Using ABI_X86="64" builds successfully.
More experimentation has found that: CFLAGS="-march=x86-64 -mtune=znver1 -O3 -pipe" CXXFLAGS="${CFLAGS}" CPU_FLAGS_X86="" ABI_X86="32" USE="-*" ebuild ffmpeg-4.4.3.ebuild clean compile Does NOT hang during build however: CFLAGS="-march=znver1 -O3 -pipe" CXXFLAGS="${CFLAGS}" CPU_FLAGS_X86="" ABI_X86="32" USE="-*" ebuild ffmpeg-4.4.3.ebuild clean compile DOES hang. Something involving -march=znver1 and -O3.
Thanks! (fwiw, re my USE=-*, I forgot to restore abi_x86_32, that's why I couldn't hit it when experimenting.) Bisecting now.
Started with: git bisect good 3ae7a822456bc538d4cefaa3a22fe56d43640a04 # 12.2.1_p20230121-r1 git bisect bad aa1f923af4d8cb5f5735d39f667f61aa7c900b5e # 12.2.1_p20230304 Result: ``` 489c81db7d4f75894e9d34aa90fe7224cfafb53a is the first bad commit commit 489c81db7d4f75894e9d34aa90fe7224cfafb53a Author: Jan Hubicka <jh@suse.cz> Date: Thu Dec 22 10:55:46 2022 +0100 Zen4 tuning part 2 Adds tunes needed for zen4 microarchitecture. I added two new knobs. TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors are split to 256 vectors. This affects vectorization costs and reassociation width. It probably should also affect RTX costs however I doubt it is very useful since RTL optimizers are usually not judging between 256 and 512 vectors. I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this flag may not be a win except for very specific benchmarks. I am still doing some more detailed testing here. Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them and since the latencies has only increased since zen3 opencoding is better than actual instrucction. This shows at 4 tsvc benchmarks. I ended up setting AVX256_OPTIMAL. This is a compromise. There are some tsvc benchmarks that increase noticeably (up to 250%) however there are also few regressions. Most of these can be solved by incrasing vec_perm cost in the vectorizer. However this does not cure about 14% regression on x264 that is quite important. Here we produce vectorized loops for avx512 that probably would be faster if the loops in question had high enough iteration count. We hit this problem with avx256 too: since the loop iterates few times, only prologues/epilogues are used. Adding another round of prologue/epilogue code does not make it better. Finally I enabled avx stores for constnat sized memcpy and memset. I am not sure why this is an opt-in feature. I think for most hardware this is a win. gcc/ChangeLog: 2022-12-22 Jan Hubicka <hubicka@ucw.cz> * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Add TARGET_AVX512_SPLIT_REGS * config/i386/i386-options.cc (ix86_option_override_internal): Honor x86_TONE_AVOID_256FMA_CHAINS. * config/i386/i386.cc (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS. (ix86_reassociation_width): Likewise. * config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for znver4. (X86_TUNE_USE_GATHER_4PARTS): Likewise. (X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4. (X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4. (X86_TUNE_AVX256_OPTIMAL): Add znver4. (X86_TUNE_AVX512_SPLIT_REGS): New tune. (X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3. (X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3. (X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4. (X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4. (cherry picked from commit eef81eefcdc2a58111e50eb2162ea1f5becc8004) gcc/config/i386/i386-expand.cc | 2 ++ gcc/config/i386/i386-options.cc | 2 ++ gcc/config/i386/i386.cc | 11 ++++++++--- gcc/config/i386/i386.h | 2 ++ gcc/config/i386/x86-tune.def | 23 +++++++++++++++-------- 5 files changed, 29 insertions(+), 11 deletions(-) bisect found first bad commit ```
Reported upstream at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=60b38d402d8674ea08c9b69cf3147e0b92ab87c2 commit 60b38d402d8674ea08c9b69cf3147e0b92ab87c2 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-03-15 02:13:59 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-03-15 02:14:48 +0000 media-video/ffmpeg: fix build on register-starved x86 Newer compilers may optimise such that < 7 registers are free on 32-bit x86 and then we get an "invalid asm" error. This is https://bugs.gentoo.org/901099 and https://trac.ffmpeg.org/ticket/8903. Making matters worse, GCC sometimes hangs on invalid asm, so this also mitigates a hang with e.g. -O3 -march=znver1. See https://bugs.gentoo.org/900937 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137. In future, we may want to adjust the definition of HAVE_7REGS to just exclude 32-bit x86, but that's a big sledgehammer, so let's avoid it for now until we have a reply on the upstream ffmpeg bug. Thanks to Ninpo. Bug: https://trac.ffmpeg.org/ticket/8903 Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137 Bug: https://bugs.gentoo.org/900937 Closes: https://bugs.gentoo.org/901099 Signed-off-by: Sam James <sam@gentoo.org> media-video/ffmpeg/ffmpeg-4.4.3.ebuild | 3 ++- media-video/ffmpeg/ffmpeg-5.1.2-r1.ebuild | 3 ++- media-video/ffmpeg/ffmpeg-6.0.ebuild | 3 ++- .../ffmpeg-4.4.3-get_cabac_inline_x86-32-bit.patch | 24 +++++++++++++++++++++ .../ffmpeg-5.1.2-get_cabac_inline_x86-32-bit.patch | 25 ++++++++++++++++++++++ 5 files changed, 55 insertions(+), 3 deletions(-)
Fixed in https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=81d762cbec9685c2f2571da21d48f42c42eff33b for 13 (landed in sys-devel/gcc-13.0.1_pre20230326). Not yet in 12. Let's call this fixed though, as we've worked around it in ffmpeg anyway, and it'll get backported in due course.