Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 90090 - [binutils] gprof reports too much time
Summary: [binutils] gprof reports too much time
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Development (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-22 18:41 UTC by euclid80
Modified: 2009-04-20 22:05 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
simple program to test gprof (test.c,94 bytes, text/plain)
2005-04-22 18:42 UTC, euclid80
Details
raw profile data (gmon.out,423 bytes, application/octet-stream)
2005-04-22 18:44 UTC, euclid80
Details
text output of gprof (gprof.txt,5.16 KB, text/plain)
2005-04-22 18:44 UTC, euclid80
Details
Patch to return failure on interval < 2 usec or unknown failure. (hertz-return-failure.patch,275 bytes, patch)
2005-07-17 19:22 UTC, Tres 'RiverRat' Melton
Details | Diff
This small program will illustrate the real problem here. (tst.c,182 bytes, text/plain)
2005-07-17 19:30 UTC, Tres 'RiverRat' Melton
Details
The new gprof output file (gmon.out,342 bytes, application/octet-stream)
2007-11-29 05:04 UTC, euclid80
Details

Note You need to log in before you can comment on or make changes to this bug.
Description euclid80 2005-04-22 18:41:16 UTC
When using the performance analyzer gprof, I found that it was giving values for the time spent in functions (and total time) that exceeded the true running time of the program.  In the example I will provide, gprof over-reports the execution time by a factor of 10.

Reproducible: Always
Steps to Reproduce:
1. gcc -pg test.c
2. time ./a.out
3. gprof

Actual Results:  
The program took 19.9 seconds to run on my Pentium III, but gprof reported that
the running time was 196.19 seconds.


Gentoo Base System version 1.4.16
Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130,
glibc-2.3.4.20041102-r1,uclibc-0.9.26-r7,uclibc-0.9.27-r0, 2.6.10-gentoo-r6 i686)
=================================================================
System uname: 2.6.10-gentoo-r6 i686 Pentium III (Coppermine)
Python:              dev-lang/python-2.3.4-r1 [2.3.4 (#1, Feb  8 2005, 14:06:28)]
distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
dev-lang/python:     2.3.4-r1
sys-devel/autoconf:  2.59-r6, 2.13
sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4
sys-devel/binutils:  2.15.92.0.2-r7
sys-devel/libtool:   1.5.14
virtual/os-headers:  2.6.8.1-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O3 -march=pentium3 -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config
/usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/
/usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/
/usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium3 -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict userpriv
usersandbox"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 X alsa apm avi bitmap-fonts cdparanoia cdr crypt curl dv dvd dvdread
emboss encode esd foomaticdb fortran gif gphoto2 gtk gtk2 imagemagick imlib java
jpeg junit libg++ libwww mbox mikmod mmx mozdevelop mozilla moznomail mp3 mpeg
ncurses network nodrm ogg oggvorbis pam pdflib perl png python qt quicktime
readline real rtc samba scanner sdl spell sse ssl tcltk tiff truetype
truetype-fonts type1-fonts usb v4l v4l2 vorbis xmms xv zlib"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Comment 1 euclid80 2005-04-22 18:42:35 UTC
Created attachment 56961 [details]
simple program to test gprof

Compile with "gcc -pg test.c"
Comment 2 euclid80 2005-04-22 18:44:17 UTC
Created attachment 56962 [details]
raw profile data
Comment 3 euclid80 2005-04-22 18:44:46 UTC
Created attachment 56963 [details]
text output of gprof
Comment 4 SpanKY gentoo-dev 2005-04-24 09:49:04 UTC
try with binutils-2.15.97 ?
Comment 5 euclid80 2005-04-25 10:15:06 UTC
OK, I tried binutils-2.15.97, but I got the same results.  I installed it, changed the links in /usr/i686-pc-linux-gnu/{bin,lib}, recompiled the test program, ran it, and got approximately the same numbers as before.

Are you implying by your question that it works for you with 2.15.97?
Comment 6 SpanKY gentoo-dev 2005-05-07 18:15:43 UTC
no, i'm not implying anything

ive never used gprof
Comment 7 euclid80 2005-05-25 17:38:04 UTC
I noticed this problem when using kernel 2.6.10.  Today I ran the test program
under kernel 2.4.28, and gprof gave the correct result.  When I booted back into
2.6.10, and ran gprof on the already generated gmon.out, it still gave the
correct result.  So, I speculate that the problem may actually be in the
profiling code generated by gcc.  Perhaps the sampling frequency is different in
the 2.6 kernels.
Comment 8 SpanKY gentoo-dev 2005-05-25 17:50:09 UTC
the HZ is diff between 2.4 and 2.6 ... iirc, it changed by a factor of 10 from
100 in 2.4 to 1000 in 2.6
Comment 9 Tres 'RiverRat' Melton 2005-07-16 15:31:42 UTC
Adding to SpanKY's comment #8, that isn't entirely correct.  The timer is
different depending upon the arch.  To see what it is set to type:

grep HZ /usr/src/linux/include/asm-*/param.h

and you can see that the i386 is still 100Hz, whereas the x86-64 is 1000Hz, and
the alpha and ia64 have it set at 1024Hz for the most part.  Sometimes it sets
itself differently (like simulating an ia64 is 32Hz) though.  Is this value
exported in /proc somewhere?
Comment 10 SpanKY gentoo-dev 2005-07-16 22:30:54 UTC
yes, HZ is arch-specific, but considering the reporter is running on x86 and the
x86 HZ changed by a factor of 10 from 1000 to 100 between 2.4 to 2.6, i dont see
why other archs matter
Comment 11 Tres 'RiverRat' Melton 2005-07-17 19:22:20 UTC
Created attachment 63661 [details, diff]
Patch to return failure on interval < 2 usec or unknown failure.

This code branch (in bin-utils' gprof/hertz.c) will probably never get executed
in the near future but when the interval drops < 2usec (the value currently in
the code) it will at least return failure until a nanosecond field is added to
setitimer.  This same error is also returned in the same scenario in glibc 's
prof-freq.c.
Comment 12 Tres 'RiverRat' Melton 2005-07-17 19:30:52 UTC
Created attachment 63662 [details]
This small program will illustrate the real problem here.

The real problem is that gprof and gcc (& the -p option) generated code rely on
checking the ticks/second at runtime via the sysconf(_SC_CLK_TCK); call.  That
returns the wrong value.  I followed this issue for a couple of hours through
many different functions/packages in an attempt to generate a patch for spanky
but I have run out of time.  (Sorry Mike)  I would like to point out though
that the original post didn't specify the problem as being x86 specific and
from this info it isn't!  My x86-64 returns a 100MHz frequency and it should be
1KHz so at the risk of sounding rude *I* do think that the arch matters.
Comment 13 Tres 'RiverRat' Melton 2005-07-17 19:32:44 UTC
The man page for setitimer says "... dependent  on  the  system  timer 
resolution  (currently 10ms)" and should also get corrected
Comment 14 Duncan 2005-07-18 04:45:01 UTC
OK, to my knowledge I've never run gprof, just happened across this, but FWIW, 
kernel ticks are changing again in 2.6.13.  I'm running the -rcs, and there's 
now a config option for Hz; it'll be a kernel settable option, currently with 
three choices, 100/250/1000 Hz, 250 being the recommended default for a 
well-rounded desktop system. 
 
I compiled and ran tst.c, here, and I get correct readings, but that may be 
simply because they coincide with expectations, 100 Hz (which I chose because 
ticks are per-CPU interrupts, and I'm running SMP dual Opteron).  I'll have to 
recompile the kernel for 250, reboot, and see if it's still accurately 
reported. 
Comment 15 Tres 'RiverRat' Melton 2005-07-18 05:58:00 UTC
Some of the devs/ops in #gentoo (moreon & reisio) confirmed 100Hz clock on i386
with 2.6.x kernels using the test code.  That is what needs fixed.
Comment 16 SpanKY gentoo-dev 2005-07-18 17:37:11 UTC
if you look at sysdeps/unix/sysv/linux/getclktck.c in glibc, you can see that
glibc first checks to see if the ELF aux vec AT_CLKTCK is set and if it isnt, it
falls back to a hardcoded '100' value

the aux vec is populated by the kernel on the stack before executing an ELF as
can be seen in linux/fs/binfmt_elf.c ... it exports the value as CLOCKS_PER_SEC
which is macrod to USER_HZ in asm/param.h which on x86_64 and i386 are both set
to 100 even though the kernel HZ define is set to 1000

last i checked, torvalds is big on the 'userspace should not care about kernel
HZ' so he doesnt export that information ... as can be seen here, we get the
shaft with the wrong kernel hz value ...

we can double check this with the provided test.c, or you can run an app through
gdb and analyze the stack before the dynamic linker gets a hold of it (thereby
cutting out glibc as being the problem) ... you should find the AT_CLKTCK aux
vec exported by the kernel on the stack with the value set to 100 (on my amd64
it is at least)
Comment 17 SpanKY gentoo-dev 2005-07-18 17:40:28 UTC
so we can say the kernel sucks because they purposefully give us the wrong value
... however, that probably wont get us anywhere because i dont think torvalds
will change his mind ;)

procps has some indepth code to grab the correct jiffies/hertz settings ...
review the old_Hertz_hack() function in proc/sysinfo.c
Comment 18 Tres 'RiverRat' Melton 2005-07-18 23:06:53 UTC
More Research:
I created a guess at a patch but the research below indicates that waiting a bit
for Linux to work this out itself is the best solution.

My system is currently crunching a set of data and I don't expect it to finish
for 5-6 more days so I haven't tested this but I 'think' the solution is to
patch /usr/src/linux-2.6.12-gentoo-r4/fs/binfmt_elf.c as follows:

@@ -193,7 +193,7 @@
 #endif
        NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
        NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
-       NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
+       NEW_AUX_ENT(AT_CLKTCK, HZ);
        NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
        NEW_AUX_ENT(AT_PHENT, sizeof (struct elf_phdr));
        NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);

If I knew that it worked I would have attached it as a patch, but I don't.  In
adding to Duncan's comment the kernel thread of relevence is here:

http://lkml.org/lkml/2005/5/16/228

and, if after 2.6.13 comes out, the issue is still there then I will persue it
in earnest.  Thanks for the pointer to binfmt_elf.c Mike, it helped, but the
procps stuff is a work around.  If this issue is still present in 2.6.13 and the
above works then perhaps we can get Gentoo's kernel guys to add a patch to
Linux.  As it stands now there is a patch
(http://lkml.org/lkml/diff/2005/5/16/308/1) outstanding, that Linus has read and
commented on, and that has been "Signed-off-by:" two kernel-devs, that would
rework the values that the above patch fiddles with so either way we should wait
 for a new kernel version that has the above mentioned patch by Christoph Lameter.
Comment 19 Tres 'RiverRat' Melton 2005-07-19 14:03:30 UTC
The patch to Linux in my previous comment works!  My clock is now reported as
1000 Hz.
Comment 20 SpanKY gentoo-dev 2005-07-19 19:16:17 UTC
that doesnt really address the fact that USER_HZ would still be exported as 100
... the only thing that changes is that the factor that gprof is off by is
different ;)

adding a patch to our kernel sources isnt really a viable option seeing as how
so many people dont use kernel ebuilds
Comment 21 Tres 'RiverRat' Melton 2005-07-19 23:52:29 UTC
Okay, I checked Linux-2.6.13-rc3 (many new config options) and the CONFIG_HZ
option did make it into the kernel but the relevent section of code:

-       NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
+       NEW_AUX_ENT(AT_CLKTCK, HZ);

has not been corrected as I had hoped.  I'm pretty sure that Duncan's system
will still report 100 Hz regardless of what he sets his CONFIG_HZ to.  Anyway
using James' test code I verified that my kernel patch does cause gprof to
produce the correct output.  So gprof is now off by a factor of 1.  Or, stated
another way, it is now correct.  James, please confirm this on your i386.

Mike and I still differ in the correct solution.  I intend to try to get the
issue fixed in the kernel and Mike has informed me that I have a better chance
of selling snow cones in the arctic circle.  He's probably right.  :-}
Comment 22 Tres 'RiverRat' Melton 2005-07-20 07:07:41 UTC
To quickly determine if your computer is off you can enter, as a user at the
shell prompt, "LD_SHOW_AUXV=1 getconf CLK_TCK" and look for AT_CLKTCK.  If it
says 100 and you are on a 2.6.x kernel then it is most likely wrong.  For sure
on x86-64 and i386 systems.

Comment 23 Tres 'RiverRat' Melton 2005-07-20 07:37:53 UTC
Redhat Bug 151763 is a duplicate of this bug.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151763
Comment 24 Tres 'RiverRat' Melton 2005-07-20 08:04:26 UTC
Affected hardware update:

AT_CLKTCK       Kernel                            Arch

100             2.6.11.11                         athlon-xp
100             2.6.12-gentoo-r1                  pentium-m
100             2.6.10-mipscvs-20050115-ip32r5k   mips64
100             ???                               r5000
100             2.6.12.2 ppc64                    4-way power5
100             2.6.12 vanilla                    athalonXP   
Comment 25 John Mylchreest (RETIRED) gentoo-dev 2005-07-20 08:52:59 UTC
Just to clarify the reason that USER_HZ is exported as 100Hz is pretty simple.
I breaks userspace changing it to 1000. so many things incorrectly hard-code 100
as the hz value and because things like (lets say uptime) will show a 2hr uptime
as 20hrs.
I am currently actually investigating into the gprof stuff itself, as I dont
think the funedemental problem has anything to do with kernel-space.

It just so happens that what you see matches conveniently.
Comment 26 Duncan 2005-07-21 02:02:22 UTC
(In reply to comment #22) 
> If it says 100 and you are on a 2.6.x kernel then it is most likely wrong. 
> For sure on x86-64 and i386 systems. 
 
Well... "for sure" unless you are running a non-mainstream kernel, or > 2.6.12.  
I'm running mainstream 2.6.13-rc3, and 100 is correct, for my system, an amd64, 
so /not/ for sure wrong any more! 
 
I did check it compiled for 250 Hz, and as I was afraid after reading comment 
#16 and later #25, it still reports 100 Hz. 
 
Well... look at it this way... with > 2.6.12, you can at least set the ticks to 
match what the kernel is advertising, 100 Hz!  =8^] 
 
Comment 27 Mark Loeser (RETIRED) gentoo-dev 2007-11-27 02:26:37 UTC
Is this still a problem for anyone?  Unless I'm crazy, it seems alright for me now:

halcyon@phoenix ~ $ time ./a.out 

real	0m6.241s
user	0m5.455s
sys	0m0.069s


Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
100.72      5.06     5.06        2     2.53     2.53  f


1000Hz for the timer
Comment 28 euclid80 2007-11-29 04:59:56 UTC
Actually, I get a different problem now.  It reports only half the time spent.  I think it is taking the time for one call to f() and calling it the time for both, then dividing.  Output as follows:

james@jranson ~ $ time ./a.out 

real    0m6.545s
user    0m6.550s
sys     0m0.000s

james@jranson ~ $ gprof 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
100.73      3.29     3.29        2     1.65     1.65  f

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.


                     Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.30% of 3.29 seconds

index % time    self  children    called     name
                3.29    0.00       2/2           main [2]
[1]    100.0    3.29    0.00       2         f [1]
-----------------------------------------------
                                                 <spontaneous>
[2]    100.0    0.00    3.29                 main [2]
                3.29    0.00       2/2           f [1]
-----------------------------------------------
Comment 29 euclid80 2007-11-29 05:00:57 UTC
Portage 2.1.3.19 (default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.22-gentoo-r9 i686)
=================================================================
System uname: 2.6.22-gentoo-r9 i686 Intel(R) Pentium(R) 4 CPU 1300MHz
Timestamp of tree: Wed, 28 Nov 2007 21:30:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p17
dev-java/java-config: 1.3.7, 2.0.33-r1
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.9-r2
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.22-r2
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium4 -msse2"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -march=pentium4 -msse2"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LINGUAS="en"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=" "
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac acpi alsa bash-completion bitmap-fonts branding bzip2 cairo caps cddb cdparanoia cdr cli cracklib crypt cscope cups dbus dri dvd dvdr dvdread emboss encode esd evo fam firefox fortran fuse gif gtk hal iconv imagemagick isdnlog java jpeg mad mbox midi mikmod mjpeg mmx mozbranding mozdevelop mozilla moznocompose moznoirc moznomail mp3 mpeg mplayer msn mudflap ncurses network nodrm nptl nptlonly nsplugin nvidia ogg opengl openmp pam pcre pdf perl png postgres pppd python qt3 qt3support qt4 quicktime readline real reflection rtc sdl session simplexml sockets spl sse sse2 ssl stream svg tcpd tiff truetype truetype-fonts type1-fonts usb vorbis win32codecs wmp x86 xforms xinerama xml xmlrpc xorg xv zlib" ALSA_CARDS="ens1371" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="radeon nvidia"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 30 euclid80 2007-11-29 05:04:11 UTC
Created attachment 137291 [details]
The new gprof output file
Comment 31 Mark Loeser (RETIRED) gentoo-dev 2009-04-20 22:05:39 UTC
This seems to be working for me.  I can't reproduce the problem, so if anyone can, please reopen this with how to reproduce.