Summary: | media-video/ffmpeg-0.5_p20373 fails with asm errors when building as PIC on x86 due to register pressure | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Erik Zeek <zeekec> |
Component: | New packages | Assignee: | Gentoo Media-video project <media-video> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | 1i5t5.duncan, alanh, anthoine.bourgeois, basic, bugs, david+gentoo.org, galtgendo, gentoo, help, hrabe, ikelos, john_r_graham, kanelxake, kevinlyles, kogorman, neurolabs.de, paluszak, rebecca.menessec, Reimar.Doeffinger, rjm40, rose, russell, smoothhound, suertreus, tb, tech31842, toolchain, truedfx, vdr, whuwxl, zima |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
build log
reduced test case another one Walkaround patch to compile media-video/ffmpeg-0.5_p22846 on x86 with pic Possible modification of transpose4x4 ffmpeg compile patch - TEXTREL free version patch against ebuild to always set pic for x86 ad allow mmx o amd64 |
Description
Erik Zeek
2009-10-27 13:13:30 UTC
Created attachment 208424 [details]
build log
Build log.
I can confirm this one. I guess the reason is the pic flag, with using USE="-pic" it should work. same error PS. IMHO And also "on" some flags (3dnow, 3dnowext, ssse3) by default - not good idea Created attachment 208443 [details]
reduced test case
(In reply to comment #4) > Created an attachment (id=208443) [details] > reduced test case # gcc -c casts.i casts.i: In function 't': casts.i:4: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' casts.i:4: error: 'asm' operand has impossible constraints # gcc --version gcc (Gentoo 4.4.2 p1.0) 4.4.2 Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. It seems gcc doesn't like the casts in the asm constraints. If I remove one cast, this testcase builds. If I compile with -fPIC, it fails in the same way. If I remove two casts, it builds even with -fPIC. My conclusion is that gcc needs (for no particular reason) one register to cast each variable. CC'ing toolchain, they may have different opinion or submit this to gcc upstream. (In reply to comment #4) > Created an attachment (id=208443) [details] > reduced test case This is too far reduced: it fails with -O0 only, and succeeds with -O1 or higher, with or without -fPIC. Created attachment 208465 [details]
another one
(In reply to comment #6) > (In reply to comment #4) > > Created an attachment (id=208443) [details] [details] > > reduced test case > > This is too far reduced: it fails with -O0 only, and succeeds with -O1 or > higher, with or without -fPIC. Yeah, I suppose because gcc is clever enough to understand that it can use the same register for the same variables; nevertheless, I consider "m" constraints needing a register a bit weird. Here is another one that fails at -O1,2 but succeeds at -O0; all with -fPIC. (In reply to comment #2) > I can confirm this one. > I guess the reason is the pic flag, with using USE="-pic" it should work. > Yes, I can confirm that disabling pic USE-flag helps. Not a gcc bug really, but broken asm or something along those lines. Well, it's more like throwing ball to ech other. Check: http://roundup.ffmpeg.org/roundup/ffmpeg/issue231 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11203 Disabling fpic did it for me too, is there anyone for whom it doesn't? Maybe we just filter it as a workaround, if it works for everyone... Cristi (In reply to comment #10) > Not a gcc bug really, but broken asm or something along those lines. Then explain why and what is broken. We'll get nowhere with such vague comments. If this is a dead end, mask the pic useflag on x86 and be done. (In reply to comment #13) > Then explain why and what is broken. We'll get nowhere with such vague > comments. > If this is a dead end, mask the pic useflag on x86 and be done. > Well, another approach would be making pic useflag force '--disable-mmx' on x86 - that works too, bit extreme though. Unfortunately it is incredibly hard to create a nice test case, this issue seems to be caused by having multiple asm blocks. However I claim it is a gcc bug, as evidence I have this patch that fixes the issue in dsputil_mmx and only changes a inline to attribute((__noinline__)) - I assume we all agree that code does not become invalid just due to inlining. Patch: Index: libavcodec/x86/dsputil_mmx.c =================================================================== --- libavcodec/x86/dsputil_mmx.c (revision 20575) +++ libavcodec/x86/dsputil_mmx.c (working copy) @@ -723,7 +723,13 @@ } } -static inline void transpose4x4(uint8_t *dst, uint8_t *src, int dst_stride, int src_stride){ +// HACK gcc won't compile this otherwise +#if ARCH_X86_32 && defined(PIC) +static av_noinline +#else +static inline +#endif +void transpose4x4(uint8_t *dst, uint8_t *src, int dst_stride, int src_stride){ __asm__ volatile( //FIXME could save 1 instruction if done as 8x4 ... "movd %4, %%mm0 \n\t" "movd %5, %%mm1 \n\t" *** Bug 300427 has been marked as a duplicate of this bug. *** I'm running a stable P4 system a I've got exactly the same error disabling pic USE-flag seems to fix the problem. Portage 2.1.6.13 (default/linux/x86/10.0/desktop, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.31-gentoo-r6 i686) ================================================================= System uname: Linux-2.6.31-gentoo-r6-i686-Intel-R-_Pentium-R-_M_processor_1.73GHz-with-gentoo-1.12.13 Timestamp of tree: Mon, 11 Jan 2010 06:15:02 +0000 ccache version 2.4 [disabled] app-shells/bash: 4.0_p35 dev-java/java-config: 2.1.9-r2 dev-lang/python: 2.6.4 dev-util/ccache: 2.4-r7 dev-util/cmake: 2.6.4-r3 sys-apps/baselayout: 1.12.13 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.13, 2.63-r1 sys-devel/automake: 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-Os -march=native -fomit-frame-pointer -fno-ident -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-Os -march=native -fomit-frame-pointer -fno-ident -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks fixpackages metadata-transfer parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://gentoo.prz.rzeszow.pl" LANG="en_GB.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="en en_GB en_US zh zh_CN zh_TW da da_DK pl pl_PL fr fr_FR de de_DE he he_IL vi vi_VN ru ru_RU ru_UA uk uk_UA" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage/layman/sunrise /usr/local/portage/layman/roslin /usr/local/portage/local" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X a52 aac acpi alsa archive async bash-completion berkdb bidi branding bzip2 cairo cdda cddax cddb cdr cjk cleartype cli consolekit cracklib crypt cups cxx dbus dirac djvu dri dts dvd dvdr eds emboss emerald encode evo exchange exif fam ffmpeg firefox firefox3 flac fortran ftp fuse gdbm gif gnome gphoto2 gstreamer gtk hal hddtemp iconv idn imap immqt-bc inotify iproute2 ipv6 jpeg kdehiddenvisibility laptop libnotify live mad matroska mikmod mmap mmx mmxext mng modules mp3 mp4 mpeg mudflap musepack nautilus ncurses networks nls nptl nptlonly ogg opengl openmp pam pcre pdf perl pic pidgin png pop ppds pppd python qt3support qt4 quicktime readline reflection samba schroedinger sdl session sound speex spell spl sqlite sse sse2 ssh ssl startup-notification svg sysfs tcpd theora thunar tiff truetype udev unicode usb vorbis webkit win32codecs wma x264 x86 xattr xcb xml xmp xorg xulrunner xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="canon ptp2" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en en_GB en_US zh zh_CN zh_TW da da_DK pl pl_PL fr fr_FR de de_DE he he_IL vi vi_VN ru ru_RU ru_UA uk uk_UA" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="radeon" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS However, I also have a hardened x86 with P4 and ffmpeg compiles fine there. Portage 2.1.6.13 (hardened/linux/x86/10.0, gcc-4.3.4, glibc-2.9_p20081201-r2, 2.6.28-hardened-r9 i686) ================================================================= System uname: Linux-2.6.28-hardened-r9-i686-Intel-R-_Pentium-R-_4_CPU_1.80GHz-with-gentoo-1.12.13 Timestamp of tree: Mon, 11 Jan 2010 04:45:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 4.0_p35 dev-lang/python: 2.6.4, 3.1.1-r1 dev-util/ccache: 2.4-r7 dev-util/cmake: 2.6.4-r3 sys-apps/baselayout: 1.12.13 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.63-r1 sys-devel/automake: 1.8.5-r3, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=native -pipe -fomit-frame-pointer -fno-ident" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/lib/fax /var/spool/fax/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O2 -march=native -pipe -fomit-frame-pointer -fno-ident" DISTDIR="/mnt/lap4/distfiles" FEATURES="ccache distlocks fixpackages metadata-transfer parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="pl pl_PL" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage/layman/hanno /usr/local/portage/layman/berkano /usr/local/portage/layman/mpd /usr/local/portage/layman/sunrise /usr/local/local-overlay" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="a52 aac acpi alsa amr amrnb amrwb anonres async asyncns avahi bash-completion berkdb bluetooth bzip2 cdda cdparanoia cdr cjk cli cracklib crypt cups cxx dbus directfb dri dts dvd dvdr encode fam fbcon flac ftp fuse gdbm glibc-omitfp hal hardened hinotify iconv id3 idn iproute2 ipv6 lame lastfmradio live lm_sensors logrotate lzo magic matroska mmx mmxext modules mp2 mp3 mudflap musepack ncurses network nfs nfsexport nptl nptlonly ogg openmp optimisememory pam parport pcre perl pic ppds pppd pulseaudio python quicktime readline reflection replaygain rtsp samba sane scanner serial session speex spl sse sse2 ssl stream sysfs tcpd udev unicode urandom usb vhosts vorbis win32codecs x264 x86 xattr xorg zeroconf zlib" ALSA_CARDS="intel8x0" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias proxy proxy_ajp proxy_balancer proxy_connect proxy_ftp proxy_http" CAMERAS="canon ptp2" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="pl pl_PL" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="intel i830" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS *** Bug 300131 has been marked as a duplicate of this bug. *** (In reply to comment #2) > I can confirm this one. > I guess the reason is the pic flag, with using USE="-pic" it should work. > flagedit describes "pic" as --- Local Flag: Force shared libraries to be built as PIC (this is slower) (media-video/ffmpeg) I can confirm that -pic works, but I localize it to /etc/portage/package.use with # See bug 290741 media-video/ffmpeg -pic *** Bug 304985 has been marked as a duplicate of this bug. *** *** Bug 311389 has been marked as a duplicate of this bug. *** *** Bug 313249 has been marked as a duplicate of this bug. *** *** Bug 319005 has been marked as a duplicate of this bug. *** same here, same problem with media-video/ffmpeg-0.5_p20373 -pic worked here too for information I also had to first disable trusted path execution to have it built ( echo 0 >/proc/sys/kernel/grsecurity/tpe ) or it fails with : Unable to create and execute files in /var/tmp/portage/media-video/ffmpeg-0.5_p20373/temp. Set the TMPDIR environment variable to another directory and make sure that it is not mounted noexec. Sanity test failed. *** Bug 319005 has been marked as a duplicate of this bug. *** Created attachment 233045 [details, diff]
Walkaround patch to compile media-video/ffmpeg-0.5_p22846 on x86 with pic
Attached patch allows compilation of media-video/ffmpeg-0.5_p22846 on x86 with pic enabled.
Apply in /var/tmp/portage/media-video/ffmpeg-0.5_p22846/work/ffmpeg-0.5_p22846/ directory by patch -p0 <ffmpeg-pic-compile.patch .
While we already have some strict aliasing warnings, doesn't patch from comment 26 add more of them ? > - "movq %5, %%mm6 \n\t"\
> + "movq "MANGLE(ff_pw_5) ", %%mm6\n\t"\
The point of USE=pic is to avoid the text relocations, and you're reintroducing them. If you don't care about that, just unset the pic flag.
> The point of USE=pic is to avoid the text relocations, and you're reintroducing
> them. If you don't care about that, just unset the pic flag.
There's still a bit of a difference between avoiding relocations in generic (and usually not speed-critical code) or in general, if the point is to reduce the number of non-shared pages this kind of change should not be an issue.
The transpose4x4 change however will have a serious performance impact and sure isn't a good idea.
Passing dst, dst_stride, src and src_stride and calculating the addresses manually might be possible without a speed loss, however that requires some closer look to check.
Created attachment 233055 [details, diff]
Possible modification of transpose4x4
This may work without costing performance, however I haven't even tested it, and this requires benchmarking and looking at the generated assembler code.
USE=pic should not produce textrels. FIN. hardened systems are often configured to even prevent loading of shared code that has textrels in it. > USE=pic should not produce textrels. FIN.
Then you'd have to also add --disable-asm on (32-bit) x86.
I doubt this would generally be considered any better than just disabling the pic use on x86 though.
Nobody is going to rewrite a huge amount of x86 code to be textrel-free, if anyone cares so much about this it's not an unreasonable expectation that they use an architecture whose design works properly with PIC, e.g. x86_64.
hardened users would accept disabling of assembly in ffmpeg to get textrel PIC we do this already with a few packages like gzip Created attachment 233283 [details, diff]
ffmpeg compile patch - TEXTREL free version
Here is patch that allows compilation with pic enabled, but doesn't create TEXTRELs by itself.
However, there is still a lot of them remaining in libavcodec.
This will needlessly slow-down non-PIC and x86_64, so it has no chance upstream. In addition as said in bug 319005, the qpel8 functions are already "fixed" upstream by using MANGLE. Unless the Gentoo developers seriously wish to carry their own PIC-patches for FFmpeg I strongly suggest discussing this on the upstream ffmpeg-devel mailing list. And for any Gentoo developer (or anyone else) really caring about this I again strongly suggest to add a Gentoo hardend system to FFmpeg's automated testing: http://fate.multimedia.cx/ Created attachment 242865 [details, diff]
patch against ebuild to always set pic for x86 ad allow mmx o amd64
This is for ffmpeg-0.6:
On an amd64 there is no problem what so ever to have --enable-mmx* as long as it goes hand in hand with --enable-pic, else runtime errors (picked up by the testsuit) follows. It is however about four times as fast on my machine so I really cannot understand why it was disabled for all uses of PIE to begin with.
So this patch against the ebuild does the following:
Removes USE="pic" and enforces --enable-pic for all people using PIE (i.e. hardened since why use PIE/hardened if you do not want pic and is rreally someone non-hardened having USE="pic"?).
Adds --disable-asm for x86 and only x86 since without it (both USE="pic/-pic") I get textrels that on my hardened x86 environments fails to run at all. On x86-64 ffmpeg runs fine (and around 4 times faster on my machine) with asm as long as --enable-pic is choosen, so no need at all to filter it.
(In reply to comment #36) > Created attachment 242865 [details, diff] [details, diff] > patch against ebuild to always set pic for x86 ad allow mmx o amd64 > > This is for ffmpeg-0.6: > > On an amd64 there is no problem what so ever to have --enable-mmx* as long > as it goes hand in hand with --enable-pic, else runtime errors (picked up by > the testsuit) follows. It is however about four times as fast on my machine > so I really cannot understand why it was disabled for all uses of PIE to > begin with. > > So this patch against the ebuild does the following: > Removes USE="pic" and enforces --enable-pic for all people using PIE (i.e. > hardened since why use PIE/hardened if you do not want pic and is rreally > someone non-hardened having USE="pic"?). > Adds --disable-asm for x86 and only x86 since without it (both > USE="pic/-pic") I get textrels that on my hardened x86 environments fails to > run at all. On x86-64 ffmpeg runs fine (and around 4 times faster on my > machine) with asm as long as --enable-pic is choosen, so no need at all to > filter it. the same logic is applied for the pic useflag nowadays, closing |