Go to:
Gentoo Home
Documentation
Forums
Lists
Bugs
Planet
Store
Wiki
Get Gentoo!
Gentoo's Bugzilla – Attachment 121413 Details for
Bug 68282
Request for a GCC Optimization Guide
Home
|
New
–
[Ex]
|
Browse
|
Search
|
Privacy Policy
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
co-guide.xml
co-guide.xml (text/plain), 20.13 KB, created by
nm (RETIRED)
on 2007-06-07 15:18:08 UTC
(
hide
)
Description:
co-guide.xml
Filename:
MIME Type:
Creator:
nm (RETIRED)
Created:
2007-06-07 15:18:08 UTC
Size:
20.13 KB
patch
obsolete
><?xml version='1.0' encoding='UTF-8'?> > ><!-- $Header: $ --> > ><!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> > ><guide link="/doc/en/co-guide.xml"> > ><title>Compilation Optimization Guide</title> > ><author title="Author"> > <mail link="nightmorph@gentoo.org">Joshua Saddler</mail> ></author> > ><abstract> >This guide provides an introduction to optimizing compiled code using safe, sane >CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in >general. ></abstract> > ><!-- The content of this document is licensed under the CC-BY-SA license --> ><!-- See http://creativecommons.org/licenses/by-sa/2.5 --> ><license/> > ><version>1.0</version> ><date>2007-06-03</date> > ><chapter> ><title>Introduction</title> ><section> ><title>What are CFLAGS and CXXFLAGS?</title> ><body> > ><p> >CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU >Compiler Collection, <c>gcc</c>, what kinds of switches to use when compiling >source code. CFLAGS are for code written in C, while CXXFLAGS are for code >written in C++. ></p> > ><p> >They can be used to decrease the amount of debug messages for a program, >increase error warning levels, and, of course, to optimize the code produced. >The <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Invoking-GCC.html#Invoking-GCC">GNU >gcc handbook</uri> maintains a complete list of available options and their >purposes. ></p> > ></body> ></section> ><section> ><title>How are they used?</title> ><body> > ><p> >CFLAGS and CXXFLAGS can be used in two ways. First, they can be used >per-program, by directly invoking <c>gcc</c> and then some bit of code you wish >to compile. ></p> > ><pre caption="Compiling a program directly"> >$ <i>CFLAGS="-march=i686" gcc file.c</i> ></pre> > ><p> >However, this should not be done when installing packages found in the Portage >tree. Instead, set your CFLAGS and CXXFLAGS in <path>/etc/make.conf</path>. This >way all packages will be compiled using the options you specify. ></p> > ><pre caption="CFLAGS in /etc/make.conf"> >CFLAGS="-march=athlon64 -O2 -pipe" >CXXFLAGS="${CFLAGS}" ></pre> > ><p> >As you can see, CXXFLAGS is set to use all the options present in CFLAGS. This >is what you'll want almost without fail. You shouldn't ever need to specify >additional options in CXXFLAGS. ></p> > ><impo> >Portage cannot use CFLAGS on a per-package basis, nor is there any supported >method of forcing it to do so. The flags you set in <path>/etc/make.conf</path> >will be used for <e>all</e> packages you install. ></impo> > ></body> ></section> ><section> ><title>Misconceptions</title> ><body> > ><p> >While CFLAGS and CXXFLAGS can be very effective means of getting source code to >produce smaller and/or faster binaries, they can also impair the function of >your code, bloat its size, slow down its execution time, or even cause >compilation failures! ></p> > ><p> >CFLAGS are not a magic bullet; they will not automatically make your system run >any faster or your binaries to take up less space on disk. Adding more and more >flags in an attempt to optimize (or "rice") your system is a sure recipe for >failure. There is a point at which you will reach diminishing returns. ></p> > ><p> >Despite the bragging you'll find on the internet, aggressive CFLAGS and CXXFLAGS >are far more likely to harm your programs than do them any good. Keep in mind >that the reason the flags exist in the first place is because they are designed >to be used at specific places for specific purposes. Just because one particular >CFLAG is good for one bit of code doesn't mean that it is suited to compiling >everything you will ever install on your machine! ></p> > ></body> ></section> ><section> ><title>Ready?</title> ><body> > ><p> >Now that you're aware of some of the risks involved, let's take a look at some >sane, safe optimizations for your computer. These will hold you in good stead >and will endear you to developers the next time you report a problem on <uri >link="http://bugs.gentoo.org">Bugzilla</uri>. (Developers will usually request >that you recompile a package with minimal CFLAGS to see if the problem persists. >Remember, aggressive flags can ruin code.) ></p> > ></body> ></section> ></chapter> > ><chapter> ><title>Optimizing</title> ><section> ><title>The basics</title> ><body> > ><p> >The goal behind using CFLAGS and CXXFLAGS is to create code tailor-made to your >system; it should function perfectly while being lean and fast, if possible. >Sometimes these conditions are mutually exclusive, so we'll stick with >combinations known to work well. Ideally, they are the best available for any >CPU architecture. We'll mention the aggressive flags later so you know what to >look out for. We won't discuss every option listed on the <c>gcc</c> manual >(there are hundreds), but we'll cover the basic, most common flags. ></p> > ><note> >Whenever you're not sure what a flag actually does, refer to the relevant >chapter of the <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">gcc >manual</uri>. If you're still stumped, try Google, or check out the <c>gcc</c> ><uri link="http://gcc.gnu.org/lists.html">mailing lists</uri>. ></note> > ></body> ></section> ><section> ><title>-march</title> ><body> > ><p> >The first and most important option is <c>-march</c>. This tells the compiler >what code it should produce for your processor <uri >link="http://en.wikipedia.org/wiki/Microarchitecture">architecture</uri> (or ><e>arch</e>); it says that it should produce code for a certain kind of CPU. >Different CPUs have different capabilities, support different instruction sets, >and have different ways of executing code. The <c>-march</c> flag will instruct >the compiler to produce code specifically for your CPU, with all its >capabilities, features, instruction sets, quirks, and so on. ></p> > ><p> >Even though the CHOST variable in <path>/etc/make.conf</path> specifies the >general architecture used, <c>-march</c> should still be used so that programs >can be optimized for your specific processor. ></p> > ><p> >What kind of CPU do you have? To find out, run the following command: ></p> > ><pre caption="Examining CPU information"> >$ <i>cat /proc/cpuinfo</i> ></pre> > ><p> >Now let's see <c>-march</c> in action. This example is for an older Pentium III >chip: ></p> > ><pre caption="/etc/make.conf: Pentium III"> >CFLAGS="-march=pentium3" >CXXFLAGS="${CFLAGS}" ></pre> > ><p> >Here's another one for a 64-bit Sparc CPU: ></p> > ><pre caption="/etc/make.conf: Sparc"> >CFLAGS="-march=ultrasparc" >CXXFLAGS="${CFLAGS}" ></pre> > > ><p> >Also available are the <c>-mcpu</c> and <c>-mtune</c> flags. Either of these >should <e>only</e> be used when there is no available <c>-march</c> option. >What's the difference between them? <c>-march</c> is much more specific about >which processor features will be used when compiling code; it is a better >choice. <c>-mcpu</c> will produce much more generic code less optimized for your >machine. <c>-mtune</c> is even more generic than <c>-mcpu</c>. Whenever >possible, use <c>-march</c>. For some less common architectures such as PowerPC >and Alpha, <c>-mcpu</c> must be used. ></p> > ><note> >For more suggested <c>-march</c> settings, please read chapter 5 of the >appropriate <uri link="/doc/en/handbook/index.xml">Gentoo Installation >Handbook</uri> for your arch. Also, read the <c>gcc</c> manual's list of <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Submodel-Options.html#Submodel-Options">architecture-specific >options</uri>, as well as more detailed explanations about the differences >between <c>-march</c>, <c>-mcpu</c>, and <c>-mtune</c>. This is quite helpful >for determining which <c>-march</c> setting you should use, especially since on >some architectures, such as x86, <c>-mcpu</c> is deprecated and <c>-mtune</c> >should be used instead. ></note> > ></body> ></section> ><section> ><title>-O</title> ><body> > ><p> >Next up is the <c>-O</c> variable. This controls the overall level of >optimization. This makes the code compilation take somewhat more time, and can >take up much more memory, especially as you increase the level of optimization. ></p> > ><p> >There are five <c>-O</c> settings: <c>-O0</c>, <c>-O1</c>, <c>-O2</c>, ><c>-O3</c>, and <c>-Os</c>. You should use only one of them in ><path>/etc/make.conf</path>. ></p> > ><p> >The with the exception of <c>-O0</c>, the <c>-O</c> settings each activate >several additional flags, so be sure to read the gcc manual's chapter on <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">optimization >options</uri> to learn which flags are activated at each <c>-O</c> level, as >well as some explanations as to what they do. ></p> > ><p> >Let's examine each optimization level: ></p> > ><ul> > <li> > <c>-O0</c>: This level (that's the letter "O" followed by a zero) turns off > optimization entirely and is the default if no <c>-O</c> level is specified > in CFLAGS or CXXFLAGS. Your code will not be optimized; it's not normally > desired. > </li> > <li> > <c>-O1</c>: This is the most basic optimization level. The compiler will try > to produce faster, smaller code without taking much compilation time. > It's pretty basic, but it should get the job done all the time. > </li> > <li> > <c>-O2</c>: A step up from <c>-O1</c>. This is the <e>recommended</e> level > of optimization unless you have special needs (such as <c>-Os</c>, as will > be explained shortly). <c>-O2</c> will activate a few more flags in addition > to the ones activated by <c>-O1</c>. With <c>-O2</c>, the compiler will > attempt to increase code performance without compromising on size, and > without taking too much compilation time. > </li> > <li> > <c>-O3</c>: This is the highest level of optimization possible, and also the > riskiest. It will take a longer time to compile your code with this option, > and in fact it <e>should not be used system-wide with <c>gcc</c> 4.x</e>. > The behavior of <c>gcc</c> has changed significantly since version 3.x. In > 3.x, <c>-O3</c> has been shown to lead to marginally faster execution times > over <c>-O2</c>, but this is no longer the case with <c>gcc</c> 4.x. > Compiling all your packages with <c>-O3</c> <e>will</e> result in larger > binaries that require more memory, and will significantly increase the odds > of compilation failure or unexpected program behavior (including errors). > The downsides outweigh the benefits; remember the principle of diminishing > returns. <b>Using <c>-O3</c> is not recommended for <c>gcc</c> 4.x.</b> > </li> > <li> > <c>-Os</c>: This level will optimize your code for size. It activates all > <c>-O2</c> options that don't increase the size of the generated code. It's > useful for machines that have extremely limited disk storage space and/or > have CPUs with small cache sizes. > </li> ></ul> > ><p> >As previously mentioned, <c>-O2</c> is the recommended optimization level. If >package compilations error out, check to make sure that you aren't using ><c>-O3</c>. As a fallback option, try setting your CFLAGS and CXXFLAGS to a >lower optimization level, such as <c>-O1</c> or <c>-Os</c> and recompile the >package. ></p> > ></body> ></section> ><section> ><title>-pipe</title> ><body> > ><p> >A fun, safe flag to use is <c>-pipe</c>. This flag actually has no effect on the >generated code, but it makes the compilation process faster. It tells the >compiler to use pipes instead of temporary files during the different stages of >compilation. ></p> > ></body> ></section> ><section> ><title>-fomit-frame-pointer</title> ><body> > ><p> >This is a very common flag designed to reduce generated code size. It is turned >on at all levels of <c>-O</c> (except <c>-O0</c>) on architectures where doing >so does not interfere with debugging, but you may need to activate it yourself >by adding it to your flags. Though the GNU <c>gcc</c> manual fails to specify >all architectures it is turned on by using <c>-O</c>, you will need to explicity >activate it on x86 and x86-64. However, using this flag will make debugging hard >to impossible. ></p> > ><p> >In particular, it makes troubleshooting applications written in Java much >harder, though Java is not the only code affected by using this flag. So while >the flag can help, it can also make debugging harder. If you don't plan to do >much debugging and haven't added any other debugging-related CFLAGS such as ><c>-ggdb</c> (and you aren't installing packages with the <c>debug</c> USE >flag), then try using <c>-fomit-frame-pointer</c>. ></p> > ><impo> >Do <e>not</e> combine <c>-fomit-frame-pointer</c> with the similar flag ><c>-momit-leaf-frame-pointer</c>. Using the latter flag is discouraged, as ><c>-fomit-frame-pointer</c> already does the job properly. Furthermore, ><c>-momit-frame-pointer</c> has been shown to <uri >link="http://www.coyotegulch.com/products/acovea/aco5p4gcc40.html">negatively >impact</uri> code performance. ></impo> > ></body> ></section> ><section> ><title>-msse, -msse2, -msse3, -mmmx, -m3dnow</title> ><body> > ><p> >These flags enable the <uri >link="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</uri>, <uri >link="http://en.wikipedia.org/wiki/SSE2">SSE2</uri>, <uri >link="http://en.wikipedia.org/wiki/SSSE3">SSE3</uri>, <uri >link="http://en.wikipedia.org/wiki/MMX">MMX</uri>, and <uri >link="http://en.wikipedia.org/wiki/3dnow">3DNow!</uri> instruction sets for x86 >and x86-64 architectures. These are useful primarily in multimedia, gaming, and >other floating point-intensive computing tasks, though they also contain several >other mathematical enhancements. These instruction sets are found in more modern >CPUs. ></p> > ><impo> >Be sure to check if your CPU supports these by running <c>cat /proc/cpuinfo</c>. >The output will include any supported additional instruction sets. Note that ><b>pni</b> is just a different name for SSE3. ></impo> > ><p> >You normally don't need to add any of these flags to <path>/etc/make.conf</path> >as long as you are using the correct <c>-march</c> (for example, ><c>-march=nocona</c> implies <c>-msse3</c>). Some notable exceptions are newer >VIA and AMD64 CPUs that support instructions not implied by <c>-march</c> (such >as SSE3). For CPUs like these you'll need to enable additional flags where >appropriate after checking the output of <c>cat /proc/cpuinfo</c>. ></p> > ><note> >You should check the <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options">list</uri> >of x86 and x86-64-specific flags to see which of these instruction sets are >activated by the proper CPU type flag. If an instruction is listed, then you >don't need to specify it; it will be turned on by using the proper <c>-march</c> >setting. ></note> > ></body> ></section> ></chapter> > ><chapter> ><title>Optimization FAQs</title> ><section> ><title>But I get better performance with -funroll-loops -fomg-optimize!</title> ><body> > ><p> >No, you only <e>think</e> you do because someone has convinced you that more >flags are better. Aggressive flags will only hurt your applications when used >system-wide. Even the <c>gcc</c> <uri >link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">manual</uri> >says that using <c>-funroll-loops</c> and <c>-funroll-all-loops</c> makes code >larger and run more slowly. Yet for some reason, these two flags, along with ><c>-ffast-math</c>, <c>-fforce-mem</c>, <c>-fforce-addr</c>, and similar flags, >continue to be very popular among ricers who want the biggest bragging rights. ></p> > ><p> >The truth of the matter is that they are dangerously aggressive flags. Take a >good look around the <uri link="http://forums.gentoo.org">Gentoo Forums</uri> >and <uri link="http://bugs.gentoo.org">Bugzilla</uri> to see what those flags >do: nothing good! ></p> > ><p> >You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only >hurt performance. They may make you sound like you have a high-performance >system running on the bleeding edge, but they don't do anything but bloat your >code and get your bugs marked INVALID or WONTFIX. ></p> > ><p> >You don't need dangerous flags like these. <b>Don't use them</b>. Stick to the >basics: <c>-march</c>, <c>-O</c>, and <c>-pipe</c>. ></p> > ></body> ></section> ><section> ><title>What about -O levels higher than 3?</title> ><body> > ><p> >Some users boast about even better performance obtained by using <c>-O4</c>, ><c>-O9</c>, and so on, but the reality is that <c>-O</c> levels higher than 3 >have no effect. The compiler may accept CFLAGS like <c>-O4</c>, but it actually >doesn't do anything with them. It only performs the optimizations for ><c>-O3</c>, nothing more. ></p> > ><p> >Need more proof? Examine the <c>gcc</c> <uri >link="http://gcc.gnu.org/viewcvs/trunk/gcc/opts.c?revision=124622&view=markup">source >code</uri>: ></p> > ><pre caption="-O source code"> >if (optimize >= 3) > { > flag_inline_functions = 1; > flag_unswitch_loops = 1; > flag_gcse_after_reload = 1; > /* Allow even more virtual operators. */ > set_param_value ("max-aliased-vops", 1000); > set_param_value ("avg-aliased-vops", 3); > } ></pre> > ><p> >As you can see, any value higher than 3 is treated as just <c>-O3</c>. ></p> > ></body> ></section> ><section> ><title>What about redundant flags?</title> ><body> > ><p> >Oftentimes CFLAGS and CXXFLAGS that are turned on at various <c>-O</c> levels >are specified redundantly in <path>/etc/make.conf</path>. Sometimes this is done >out of ignorance, but it is also done to avoid flag filtering or flag replacing. ></p> > ><p> >Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It >is usually done because packages fail to compile at certain <c>-O</c> levels, or >when the source code is too sensitive for any additional flags to be used. The >ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace ><c>-O</c> with a different level. ></p> > ><p> >The <uri >link="http://devmanual.gentoo.org/ebuild-writing/functions/src_compile/build-environment/index.html">Gentoo >Developer Manual</uri> outlines where and how flag filtering/replacing works. ></p> > ><p> >It's possible to circumvent <c>-O</c> filtering by redundantly listing the flags >for a certain level, such as <c>-O3</c>, by doing things like: ></p> > ><pre caption="Specifying redundant CFLAGS"> >CFLAGS="-O3 -finline-functions -funswitch-loops" ></pre> > ><p> >However, <brite>this is not a smart thing to do</brite>. CFLAGS are filtered for >a reason! When flags are filtered, it means that it is unsafe to build a package >with those flags. Clearly, it is <e>not</e> safe to compile your whole system >with <c>-O3</c> if some of the flags turned on by that level will cause problems >with certain packages. Therefore, you shouldn't try to "outsmart" the developers >who maintain those packages. <e>Trust the developers</e>. Flag filtering and >replacing is done for your benefit! If an ebuild specifies alternative flags, >then don't try to get around it. ></p> > ><p> >You will most likely continue to run into problems when you build a package with >unacceptable flags. When you report your troubles on Bugzilla, the flags you use >in <path>/etc/make.conf</path> will be readily visible and you will be told to >recompile without those flags. Save yourself the trouble of recompiling by not >using redundant flags in the first place! Don't just automatically assume that >you know better than the developers. ></p> > ></body> ></section> ><section> ><title>What about LDFLAGS?</title> ><body> > ><p> >Don't use them. You may have heard that they can speed up application load times >or reduce binary size, but in reality, LDFLAGS are more likely to make your >applications stop working. They are not supported, and you can expect to have >your bugs closed and marked INVALID if you report errors with packages while >using LDFLAGS. At the very least you will have to recompile all affected >packages without setting LDFLAGS. ></p> > ></body> ></section> ></chapter> > ><chapter> ><title>Resources</title> ><section> ><body> > ><p> >The following resources are of some help in further understanding optimization: ></p> > ><ul> > <li> > The <uri link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/">GNU gcc > manual</uri> > </li> > <li> > Chapter 5 of the <uri link="/doc/en/handbook/">Gentoo Installation > Handbooks</uri> > </li> > <li><c>man make.conf</c></li> > <li><uri link="http://en.wikipedia.org">Wikipedia</uri></li> > <li> > <uri link="http://www.coyotegulch.com/products/acovea/">Acovea</uri>, a > benchmarking and test suite that is useful for determining how different > compiler flags affect generated code. It is available in Portage: <c>emerge > acovea</c>. > </li> > <li>The <uri link="http://forums.gentoo.org">Gentoo Forums</uri></li> ></ul> > ></body> ></section> ></chapter> ></guide>
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 68282
:
121094
|
121413
|
122366