When using the performance analyzer gprof, I found that it was giving values for the time spent in functions (and total time) that exceeded the true running time of the program. In the example I will provide, gprof over-reports the execution time by a factor of 10. Reproducible: Always Steps to Reproduce: 1. gcc -pg test.c 2. time ./a.out 3. gprof Actual Results: The program took 19.9 seconds to run on my Pentium III, but gprof reported that the running time was 196.19 seconds. Gentoo Base System version 1.4.16 Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130, glibc-2.3.4.20041102-r1,uclibc-0.9.26-r7,uclibc-0.9.27-r0, 2.6.10-gentoo-r6 i686) ================================================================= System uname: 2.6.10-gentoo-r6 i686 Pentium III (Coppermine) Python: dev-lang/python-2.3.4-r1 [2.3.4 (#1, Feb 8 2005, 14:06:28)] distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] dev-lang/python: 2.3.4-r1 sys-devel/autoconf: 2.59-r6, 2.13 sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4 sys-devel/binutils: 2.15.92.0.2-r7 sys-devel/libtool: 1.5.14 virtual/os-headers: 2.6.8.1-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O3 -march=pentium3 -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -march=pentium3 -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict userpriv usersandbox" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 X alsa apm avi bitmap-fonts cdparanoia cdr crypt curl dv dvd dvdread emboss encode esd foomaticdb fortran gif gphoto2 gtk gtk2 imagemagick imlib java jpeg junit libg++ libwww mbox mikmod mmx mozdevelop mozilla moznomail mp3 mpeg ncurses network nodrm ogg oggvorbis pam pdflib perl png python qt quicktime readline real rtc samba scanner sdl spell sse ssl tcltk tiff truetype truetype-fonts type1-fonts usb v4l v4l2 vorbis xmms xv zlib" Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Created attachment 56961 [details] simple program to test gprof Compile with "gcc -pg test.c"
Created attachment 56962 [details] raw profile data
Created attachment 56963 [details] text output of gprof
try with binutils-2.15.97 ?
OK, I tried binutils-2.15.97, but I got the same results. I installed it, changed the links in /usr/i686-pc-linux-gnu/{bin,lib}, recompiled the test program, ran it, and got approximately the same numbers as before. Are you implying by your question that it works for you with 2.15.97?
no, i'm not implying anything ive never used gprof
I noticed this problem when using kernel 2.6.10. Today I ran the test program under kernel 2.4.28, and gprof gave the correct result. When I booted back into 2.6.10, and ran gprof on the already generated gmon.out, it still gave the correct result. So, I speculate that the problem may actually be in the profiling code generated by gcc. Perhaps the sampling frequency is different in the 2.6 kernels.
the HZ is diff between 2.4 and 2.6 ... iirc, it changed by a factor of 10 from 100 in 2.4 to 1000 in 2.6
Adding to SpanKY's comment #8, that isn't entirely correct. The timer is different depending upon the arch. To see what it is set to type: grep HZ /usr/src/linux/include/asm-*/param.h and you can see that the i386 is still 100Hz, whereas the x86-64 is 1000Hz, and the alpha and ia64 have it set at 1024Hz for the most part. Sometimes it sets itself differently (like simulating an ia64 is 32Hz) though. Is this value exported in /proc somewhere?
yes, HZ is arch-specific, but considering the reporter is running on x86 and the x86 HZ changed by a factor of 10 from 1000 to 100 between 2.4 to 2.6, i dont see why other archs matter
Created attachment 63661 [details, diff] Patch to return failure on interval < 2 usec or unknown failure. This code branch (in bin-utils' gprof/hertz.c) will probably never get executed in the near future but when the interval drops < 2usec (the value currently in the code) it will at least return failure until a nanosecond field is added to setitimer. This same error is also returned in the same scenario in glibc 's prof-freq.c.
Created attachment 63662 [details] This small program will illustrate the real problem here. The real problem is that gprof and gcc (& the -p option) generated code rely on checking the ticks/second at runtime via the sysconf(_SC_CLK_TCK); call. That returns the wrong value. I followed this issue for a couple of hours through many different functions/packages in an attempt to generate a patch for spanky but I have run out of time. (Sorry Mike) I would like to point out though that the original post didn't specify the problem as being x86 specific and from this info it isn't! My x86-64 returns a 100MHz frequency and it should be 1KHz so at the risk of sounding rude *I* do think that the arch matters.
The man page for setitimer says "... dependent on the system timer resolution (currently 10ms)" and should also get corrected
OK, to my knowledge I've never run gprof, just happened across this, but FWIW, kernel ticks are changing again in 2.6.13. I'm running the -rcs, and there's now a config option for Hz; it'll be a kernel settable option, currently with three choices, 100/250/1000 Hz, 250 being the recommended default for a well-rounded desktop system. I compiled and ran tst.c, here, and I get correct readings, but that may be simply because they coincide with expectations, 100 Hz (which I chose because ticks are per-CPU interrupts, and I'm running SMP dual Opteron). I'll have to recompile the kernel for 250, reboot, and see if it's still accurately reported.
Some of the devs/ops in #gentoo (moreon & reisio) confirmed 100Hz clock on i386 with 2.6.x kernels using the test code. That is what needs fixed.
if you look at sysdeps/unix/sysv/linux/getclktck.c in glibc, you can see that glibc first checks to see if the ELF aux vec AT_CLKTCK is set and if it isnt, it falls back to a hardcoded '100' value the aux vec is populated by the kernel on the stack before executing an ELF as can be seen in linux/fs/binfmt_elf.c ... it exports the value as CLOCKS_PER_SEC which is macrod to USER_HZ in asm/param.h which on x86_64 and i386 are both set to 100 even though the kernel HZ define is set to 1000 last i checked, torvalds is big on the 'userspace should not care about kernel HZ' so he doesnt export that information ... as can be seen here, we get the shaft with the wrong kernel hz value ... we can double check this with the provided test.c, or you can run an app through gdb and analyze the stack before the dynamic linker gets a hold of it (thereby cutting out glibc as being the problem) ... you should find the AT_CLKTCK aux vec exported by the kernel on the stack with the value set to 100 (on my amd64 it is at least)
so we can say the kernel sucks because they purposefully give us the wrong value ... however, that probably wont get us anywhere because i dont think torvalds will change his mind ;) procps has some indepth code to grab the correct jiffies/hertz settings ... review the old_Hertz_hack() function in proc/sysinfo.c
More Research: I created a guess at a patch but the research below indicates that waiting a bit for Linux to work this out itself is the best solution. My system is currently crunching a set of data and I don't expect it to finish for 5-6 more days so I haven't tested this but I 'think' the solution is to patch /usr/src/linux-2.6.12-gentoo-r4/fs/binfmt_elf.c as follows: @@ -193,7 +193,7 @@ #endif NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE); - NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); + NEW_AUX_ENT(AT_CLKTCK, HZ); NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); NEW_AUX_ENT(AT_PHENT, sizeof (struct elf_phdr)); NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); If I knew that it worked I would have attached it as a patch, but I don't. In adding to Duncan's comment the kernel thread of relevence is here: http://lkml.org/lkml/2005/5/16/228 and, if after 2.6.13 comes out, the issue is still there then I will persue it in earnest. Thanks for the pointer to binfmt_elf.c Mike, it helped, but the procps stuff is a work around. If this issue is still present in 2.6.13 and the above works then perhaps we can get Gentoo's kernel guys to add a patch to Linux. As it stands now there is a patch (http://lkml.org/lkml/diff/2005/5/16/308/1) outstanding, that Linus has read and commented on, and that has been "Signed-off-by:" two kernel-devs, that would rework the values that the above patch fiddles with so either way we should wait for a new kernel version that has the above mentioned patch by Christoph Lameter.
The patch to Linux in my previous comment works! My clock is now reported as 1000 Hz.
that doesnt really address the fact that USER_HZ would still be exported as 100 ... the only thing that changes is that the factor that gprof is off by is different ;) adding a patch to our kernel sources isnt really a viable option seeing as how so many people dont use kernel ebuilds
Okay, I checked Linux-2.6.13-rc3 (many new config options) and the CONFIG_HZ option did make it into the kernel but the relevent section of code: - NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC); + NEW_AUX_ENT(AT_CLKTCK, HZ); has not been corrected as I had hoped. I'm pretty sure that Duncan's system will still report 100 Hz regardless of what he sets his CONFIG_HZ to. Anyway using James' test code I verified that my kernel patch does cause gprof to produce the correct output. So gprof is now off by a factor of 1. Or, stated another way, it is now correct. James, please confirm this on your i386. Mike and I still differ in the correct solution. I intend to try to get the issue fixed in the kernel and Mike has informed me that I have a better chance of selling snow cones in the arctic circle. He's probably right. :-}
To quickly determine if your computer is off you can enter, as a user at the shell prompt, "LD_SHOW_AUXV=1 getconf CLK_TCK" and look for AT_CLKTCK. If it says 100 and you are on a 2.6.x kernel then it is most likely wrong. For sure on x86-64 and i386 systems.
Redhat Bug 151763 is a duplicate of this bug. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151763
Affected hardware update: AT_CLKTCK Kernel Arch 100 2.6.11.11 athlon-xp 100 2.6.12-gentoo-r1 pentium-m 100 2.6.10-mipscvs-20050115-ip32r5k mips64 100 ??? r5000 100 2.6.12.2 ppc64 4-way power5 100 2.6.12 vanilla athalonXP
Just to clarify the reason that USER_HZ is exported as 100Hz is pretty simple. I breaks userspace changing it to 1000. so many things incorrectly hard-code 100 as the hz value and because things like (lets say uptime) will show a 2hr uptime as 20hrs. I am currently actually investigating into the gprof stuff itself, as I dont think the funedemental problem has anything to do with kernel-space. It just so happens that what you see matches conveniently.
(In reply to comment #22) > If it says 100 and you are on a 2.6.x kernel then it is most likely wrong. > For sure on x86-64 and i386 systems. Well... "for sure" unless you are running a non-mainstream kernel, or > 2.6.12. I'm running mainstream 2.6.13-rc3, and 100 is correct, for my system, an amd64, so /not/ for sure wrong any more! I did check it compiled for 250 Hz, and as I was afraid after reading comment #16 and later #25, it still reports 100 Hz. Well... look at it this way... with > 2.6.12, you can at least set the ticks to match what the kernel is advertising, 100 Hz! =8^]
Is this still a problem for anyone? Unless I'm crazy, it seems alright for me now: halcyon@phoenix ~ $ time ./a.out real 0m6.241s user 0m5.455s sys 0m0.069s Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 100.72 5.06 5.06 2 2.53 2.53 f 1000Hz for the timer
Actually, I get a different problem now. It reports only half the time spent. I think it is taking the time for one call to f() and calling it the time for both, then dividing. Output as follows: james@jranson ~ $ time ./a.out real 0m6.545s user 0m6.550s sys 0m0.000s james@jranson ~ $ gprof Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 100.73 3.29 3.29 2 1.65 1.65 f % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed. Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.30% of 3.29 seconds index % time self children called name 3.29 0.00 2/2 main [2] [1] 100.0 3.29 0.00 2 f [1] ----------------------------------------------- <spontaneous> [2] 100.0 0.00 3.29 main [2] 3.29 0.00 2/2 f [1] -----------------------------------------------
Portage 2.1.3.19 (default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.22-gentoo-r9 i686) ================================================================= System uname: 2.6.22-gentoo-r9 i686 Intel(R) Pentium(R) 4 CPU 1300MHz Timestamp of tree: Wed, 28 Nov 2007 21:30:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.4 [enabled] app-shells/bash: 3.2_p17 dev-java/java-config: 1.3.7, 2.0.33-r1 dev-lang/python: 2.4.4-r6 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 1.12.9-r2 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.3.16 sys-devel/libtool: 1.5.24 virtual/os-headers: 2.6.22-r2 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=pentium4 -msse2" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-O2 -march=pentium4 -msse2" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" LINGUAS="en" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY=" " SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X a52 aac acpi alsa bash-completion bitmap-fonts branding bzip2 cairo caps cddb cdparanoia cdr cli cracklib crypt cscope cups dbus dri dvd dvdr dvdread emboss encode esd evo fam firefox fortran fuse gif gtk hal iconv imagemagick isdnlog java jpeg mad mbox midi mikmod mjpeg mmx mozbranding mozdevelop mozilla moznocompose moznoirc moznomail mp3 mpeg mplayer msn mudflap ncurses network nodrm nptl nptlonly nsplugin nvidia ogg opengl openmp pam pcre pdf perl png postgres pppd python qt3 qt3support qt4 quicktime readline real reflection rtc sdl session simplexml sockets spl sse sse2 ssl stream svg tcpd tiff truetype truetype-fonts type1-fonts usb vorbis win32codecs wmp x86 xforms xinerama xml xmlrpc xorg xv zlib" ALSA_CARDS="ens1371" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="radeon nvidia" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Created attachment 137291 [details] The new gprof output file
This seems to be working for me. I can't reproduce the problem, so if anyone can, please reopen this with how to reproduce.