Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 303120 - sys-process/procps: fix "Unknown HZ value!" on some machines
Summary: sys-process/procps: fix "Unknown HZ value!" on some machines
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 353630
  Show dependency tree
 
Reported: 2010-02-01 17:17 UTC by Alexander Holler
Modified: 2011-09-26 23:00 UTC (History)
20 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
patch from debian (30_sysinfo_7numbers.dpatch) (30_sysinfo_7numbers.dpatch,2.20 KB, patch)
2010-02-01 17:18 UTC, Alexander Holler
Details | Diff
the patch for the ebuild (procps-3.2.8.ebuild.diff) (procps-3.2.8.ebuild.diff,454 bytes, patch)
2010-02-01 17:20 UTC, Alexander Holler
Details | Diff
patch: call init_Linux_version() before init_libproc() (procps-3.2.8-fix-constructor-disorder.patch,1001 bytes, patch)
2010-10-29 06:19 UTC, Chris Coleman
Details | Diff
procps-3.2.8-linux-ver-init.patch (procps-3.2.8-linux-ver-init.patch,640 bytes, patch)
2010-11-14 00:32 UTC, SpanKY
Details | Diff
debian already has a patch for this (proc_version_constructor.patch,1.24 KB, patch)
2010-11-17 20:44 UTC, Chris Coleman
Details | Diff
above patch can't be applied without this one (gnu-kbsd-version.patch,1.38 KB, patch)
2010-11-17 20:46 UTC, Chris Coleman
Details | Diff
rewrite to fix compilation with <gcc-4.3 (procps-3.2.8-linux-ver-init.patch,1.47 KB, patch)
2011-02-07 18:24 UTC, Chris Coleman
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Holler 2010-02-01 17:17:37 UTC
On some machines with high I/O or interrupt rates, some tools from procps
are failing with the message "Unknown HZ value!" which might lead to problems in many scripts.

See debian bug 460331 for a further explanation.

Diff for procps-3.2.8.ebuild:
--------------
--- procps-3.2.8.ebuild 2009-11-23 06:06:46.000000000 +0100
+++ procps-3.2.8.ebuild.new     2010-02-01 18:10:00.538338887 +0100
@@ -48,6 +48,11 @@
        if ! use n32 ; then
                epatch "${FILESDIR}"/${PN}-3.2.6-mips-n32_isnt_usable_on_mips64_yet.patch
        fi
+
+       # Patch to fix an error in procps with newer kernels to get rid
+       # of the message "Unknown HZ value!".
+       # See Debian bug 460331.
+       epatch "${FILESDIR}"/30_sysinfo_7numbers.dpatch
 }

 src_compile() {
--------------

The patch found in debians bugzilla should be placed (unmodified) in files. I'm attaching the patch.
Comment 1 Alexander Holler 2010-02-01 17:18:56 UTC
Created attachment 218096 [details, diff]
patch from debian (30_sysinfo_7numbers.dpatch)
Comment 2 Alexander Holler 2010-02-01 17:20:29 UTC
Created attachment 218098 [details, diff]
the patch for the ebuild (procps-3.2.8.ebuild.diff)
Comment 3 Peter Volkov (RETIRED) gentoo-dev 2010-02-01 21:12:43 UTC
Thank you for report. Does there exist upstream bug report/fix?
Comment 4 Alexander Holler 2010-02-01 22:52:57 UTC
The patch is not in the repo at sf and it seems there is no bugtracker active. So someone should mail that to one of the mls. Maybe the debian people have forgotten that.
Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev 2010-08-07 09:17:58 UTC
Same thing is on Marvell sheevaplug:

 * Configuring kernel parameters ...Unknown HZ value! (68) Assume 100.
                                      [ ok ]
 * Cleaning /var/lock, /var/run ...Unknown HZ value! (67) Assume 100.
Unknown HZ value! (67) Assume 100.
Unknown HZ value! (67) Assume 100.
Unknown HZ value! (67) Assume 100.
                                       [ ok ]
...
Unknown HZ value! (85) Assume 100.

It's harder to see the messages when system is idle, but they pop up on each boot.

sys-process/procps-3.2.8
Comment 6 Steev Klimaszewski (RETIRED) gentoo-dev 2010-10-13 04:41:38 UTC
I tried that patch, but was still having issues with it showing up, I found a different patch in https://bugs.launchpad.net/debian/+source/procps/+bug/364656 that I'm applying in the Efika overlay ( http://github.com/steev/efikamx ) and it has appeared to fix the issue for me.  I had to re-work it though as it wouldn't apply cleanly for me for some reason.
Comment 7 Alexander Holler 2010-10-13 06:19:58 UTC
The problem is, that procps reads some values from /proc/somewhere and everytime something changes in the kernel, procps needs changed too. That interface doesn't seem to be very stable.

I've tried to send the maintainter an e-mail, no response. And the mailing list is full of spam. So I don't know if upstream exists. ;)
Comment 8 SpanKY gentoo-dev 2010-10-14 00:19:31 UTC
that's the whole point of picking AT_CLKTCK out of the ELF auxv the kernel provides.  really, all __linux__ and __ELF__ systems should be preferring that over anything /proc/ has to say.  so a better fix imo would be to first walk the stack and find the AT_CLKTCK value before even thinking of looking in /proc/.
Comment 9 Chris Coleman 2010-10-29 06:05:10 UTC
The debian patch doesn't fix the problem on my system. I think know why too.

I've been looking into this and I think the problem we are all experiencing actually stems from something else entirely.

The debian patch addresses the issue by modifying the old_Hertz_hack() function. I've read the code in proc/sysinfo.c and it seems that old_Hertz_hack() is only meant to be there as a fallback. It's not even supposed to be used on modern (>2.4.0) Linux systems.

(In reply to comment #8)
> that's the whole point of picking AT_CLKTCK out of the ELF auxv the kernel
> provides.  really, all __linux__ and __ELF__ systems should be preferring that
> over anything /proc/ has to say.  so a better fix imo would be to first walk
> the stack and find the AT_CLKTCK value before even thinking of looking in
> /proc/.
> 

By design, it should be doing that. But there's a problem. This is init_libproc() from proc/sysinfo.c:

static void init_libproc(void) __attribute__((constructor))
{

  ...

  if(linux_version_code > LINUX_VERSION(2, 4, 0)){ 
    Hertz = find_elf_note(AT_CLKTCK);
    if(Hertz!=NOTE_NOT_FOUND) return;
    fputs("2.4+ kernel w/o ELF notes? -- report this\n", stderr);
  }
  old_Hertz_hack();
}

The line that calls find_elf_note(AT_CLKTCK) _never_ gets executed. The function _always_ falls back on old_Hertz_hack().

The problem is that `linux_version_code` is still zero at this point because 
init_Linux_version(), which initialises it, hasn't been called yet. Why? Because, like init_libproc(), init_Linux_version() is declared with __attribute__((constructor)):

static void init_Linux_version(void) __attribute__((constructor));

These functions are both automatically called before main(), but because no priority is specified for either they are being executed in the wrong order.

My suggestion is to change the function declarations and add a priority value to control the order of execution.

-static void init_libproc(void) __attribute__((constructor));
+static void init_libproc(void) __attribute__((constructor(100)))

-static void init_Linux_version(void) __attribute__((constructor));
+static void init_Linux_version(void) __attribute__((constructor(200)));
Comment 10 Chris Coleman 2010-10-29 06:08:22 UTC
(In reply to comment #9)
> -static void init_libproc(void) __attribute__((constructor));
> +static void init_libproc(void) __attribute__((constructor(100)))
> 
> -static void init_Linux_version(void) __attribute__((constructor));
> +static void init_Linux_version(void) __attribute__((constructor(200)));
> 

Sorry, that would be backwards. This would be correct:

-static void init_libproc(void) __attribute__((constructor));
+static void init_libproc(void) __attribute__((constructor(200)))

-static void init_Linux_version(void) __attribute__((constructor));
+static void init_Linux_version(void) __attribute__((constructor(100)));
Comment 11 Chris Coleman 2010-10-29 06:19:25 UTC
Created attachment 252457 [details, diff]
patch: call init_Linux_version() before init_libproc()
Comment 12 Elias Pipping 2010-11-02 01:16:10 UTC
I got a message like this on every startup after compiling procps with make 3.82. The default linking order is different from the one with make 3.81. Restoring it appears to take care of it. Here's one way of doing that:

  http://git.exherbo.org/?p=arbor.git;a=blobdiff;f=packages/sys-process/procps/files/procps-3.2.8-make-3.82.patch;h=b64693f1c620033b98e0fa0452ba6c091f586121;hp=e52fc375083980278c5f2b1490ee9d0fea52506c;hb=913d4b625c48c28fc4a06ffbba30e04006df5e44;hpb=a0f34f3cc9bf4795713defefdf47f4f4a0da2ae2
Comment 13 Chris Coleman 2010-11-02 02:25:36 UTC
I think linking order determines the order in which constructor functions are called. So enforcing a particular linking order would indeed solve the problem, as would assigning priorities to the constructor functions.
Comment 14 SpanKY gentoo-dev 2010-11-14 00:25:36 UTC
that make patch is simply a hack that ignores the real issue.  Chris's examination sounds pretty good/sane to me.

the only sticking point would be whether upstream would accept prioritized constructors.  but i dont see why they wouldnt.

although, the way constructors are defined, all prioritized ones are executed before non-prioritized ones.  so simply giving init_Linux_version() a priority value at all will guarantee it gets executed first.

another way to address the issue would be to turn linux_version_code() into a function that cached its result so that it always returned the correct value.
Comment 15 SpanKY gentoo-dev 2010-11-14 00:32:33 UTC
Created attachment 254257 [details, diff]
procps-3.2.8-linux-ver-init.patch

slight tweak of Chris's patch.  can people experiencing this bug try just this patch and see if it fixes things for them ?
Comment 16 Chris Coleman 2010-11-14 10:38:01 UTC
(In reply to comment #15)
> Created an attachment (id=254257) [details]
> procps-3.2.8-linux-ver-init.patch
> 
> slight tweak of Chris's patch.  can people experiencing this bug try just this
> patch and see if it fixes things for them ?
> 

That patch works for me.
Comment 17 Chris Coleman 2010-11-15 00:25:27 UTC
Incidentally, I have already submitted my patch upstream (to Albert Cahalan via sourceforge) but there has been no response. Albert is the admin of procps on sf and the sole committer of code. But he hasn't committed anything for 181 days (as I write this). So I think this bug might not be fixed upstream for quite a while.

(In reply to comment #14)
> another way to address the issue would be to turn linux_version_code() into a
> function that cached its result so that it always returned the correct value.

That is a good idea, but if we're free to rethink everything, wouldn't it be better to just not use constructors at all? The init functions could just be called from main().
Comment 18 SpanKY gentoo-dev 2010-11-15 07:41:41 UTC
whichever patch gets merged doesnt matter to me as long as it gets fixed.  but i think you're right that you might not get a response as he has been AFK from development for a while.

the point of using constructors is that this is a library.  calling it from main() would require every app that happens to use the library to be updated.
Comment 19 Chris Coleman 2010-11-17 00:11:39 UTC
(In reply to comment #18)
> whichever patch gets merged doesnt matter to me as long as it gets fixed.  but
> i think you're right that you might not get a response as he has been AFK from
> development for a while.

I've filed a report with the Debian bug tracking system. Craig Small is the maintainer of procps there. And there are some commits by a csmall on sourceforge. There is hope.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603759

> the point of using constructors is that this is a library.  calling it from
> main() would require every app that happens to use the library to be updated.

I was thinking that apps could explicitly call init_libproc() which could call init_Linux version(). But I didn't realise that there were apps outside of procps that used libproc.
Comment 20 Chris Coleman 2010-11-17 20:44:20 UTC
Created attachment 254687 [details, diff]
debian already has a patch for this

it also needs gnu-kbsd-version.patch
Comment 21 Chris Coleman 2010-11-17 20:46:22 UTC
Created attachment 254689 [details, diff]
above patch can't be applied without this one

But it has nothing to with this bug.
Comment 22 SpanKY gentoo-dev 2010-11-18 09:12:56 UTC
yeah, i dont like their approach.  it's fraught with problems if other constructors are written that need the linux code.  i think yours makes more sense and is a lot cleaner.

Alexander: you're the original reporter.  can you please try the patch i posted ?
Comment 23 Clete R. Blackwell II 2010-11-19 19:17:07 UTC
This cropped up today when I upgraded to procps-3.2.8-r1.
Comment 24 Clete R. Blackwell II 2010-11-19 20:50:16 UTC
(In reply to comment #23)
> This cropped up today when I upgraded to procps-3.2.8-r1.
> 

Patch fixes the issue.

This should be committed, as my original Googling told me that this error indicates a rootkit.
We don't want people panicking thinking that they have a rootkit when it is a simple error.
Comment 25 Chris Coleman 2010-11-21 01:43:36 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > This cropped up today when I upgraded to procps-3.2.8-r1.
> > 
> 
> Patch fixes the issue.
> 
> This should be committed, as my original Googling told me that this error
> indicates a rootkit.
> We don't want people panicking thinking that they have a rootkit when it is a
> simple error.
> 

Indeed we don't. I'd like to congratulate you, sir, on your very British way of confirming a bug report.

Doubt not that we shall go forth and fix this simple mistake.
Comment 26 Alexander Holler 2010-11-21 13:47:00 UTC
I've tried procps-3.2.8-linux-ver-init.patch and it does to fix the issue.

I assume applying 30_sysinfo_7numbers.dpatch doesn't cause any harm, so maybe this could be applied too. Or just use the constructor-fix until upstream has cleaned up the mess.
Comment 27 SpanKY gentoo-dev 2010-11-21 20:16:25 UTC
should be all set in procps-3.2.8-r2 then ... thanks Chris & Alexander
Comment 28 Chris Coleman 2010-11-23 03:58:20 UTC
(In reply to comment #26)
> I assume applying 30_sysinfo_7numbers.dpatch doesn't cause any harm, so maybe
> this could be applied too.

It wouldn't do any harm, but it wouldn't do any good. That patch adds support for newer kernels to a function that isn't used with newer kernels.

> Or just use the constructor-fix until upstream has cleaned up the mess.

It's not really a mess. And this bug wasn't entirely upstream's fault. It was caused by a change in GNU Make. As of version 3.82, GNU Make no longer sorts lists of file names when expanding automatic variables or wildcards. That change in behaviour revealed a problem in procps. Its constructors expected to be called in a certain order. So we specified the order.

Also, upstream is a man named Abert Cahalan. And he's gone fishing.
Comment 29 SpanKY gentoo-dev 2010-11-23 08:18:20 UTC
the bug lies wholly in procps.  the fact that a newer version of make caused the bug to be exposed is purely incidental.  the ELF spec is clear that unprioritized constructors may be run in any order the ldso feels like.  if the glibc ldso randomly sorted the constructors at runtime and executed them, the bug would be exposed as well.  or if the linker just happened to assemble the input objects into the output in a different order.

so yes, this bug is entirely upstream's fault.
Comment 30 Chris Coleman 2010-11-23 19:18:12 UTC
You're right, this bug actually is entirely upstream's fault. I wouldn't place any of the blame on GNU Make. When I said that this bug was "caused" by a change in Make, what I should have said was "revealed" or "triggered".
Comment 31 Alexander Holler 2010-11-23 19:33:50 UTC
Nobody has to be blamed for something. Shit happens ;) Errors as well, at least until fault tolerant machines replace us.

When I've written "until upstream has cleaned up the mess" I didn't want to blame someone, I just wanted to note, that this bug is long known not fixed upstream.
Comment 32 Chris Coleman 2010-11-25 03:46:12 UTC
(In reply to comment #31)
> Nobody has to be blamed for something. Shit happens ;) Errors as well, at least
> until fault tolerant machines replace us.

I like your philosophy.

> When I've written "until upstream has cleaned up the mess" I didn't want to
> blame someone, I just wanted to note, that this bug is long known not fixed
> upstream.

I was only trying to explain how this bug came to be a problem. I didn't mean to throw your words back at you.

Will someone adopt procps?

From: csmall@debian.org Craig Small 
To: chrsclmn@gmail.com Chris Coleman 
Date: Fri, 19 Nov 2010 11:22:04 +0000 
Subject: Re: Bug#603759 closed by Craig Small <csmall@debian.org> (Bug#603759:  fixed in procps 1:3.2.8-10) 
 
On Thu, Nov 18, 2010 at 10:11:37PM +0000, Chris Coleman wrote:
> All seems quiet on procps.sf.net. Is Albert Cahalan still active? Do
> you send him your patches?
I don't think he is active, the times he's got patches he got them from
the deb.

 - Craig
Comment 33 Chris Coleman 2011-02-07 18:24:19 UTC
Created attachment 261747 [details, diff]
rewrite to fix compilation with <gcc-4.3

Apparently assigning priorities to constructors wasn't possible until gcc-4.3. See bug #353630.

I've rewritten the patch so that init_libproc() calls init_Linux_version().
Comment 34 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-02-07 19:17:40 UTC
re-open due to comment #33
Comment 35 SpanKY gentoo-dev 2011-09-26 23:00:53 UTC
procps-3.2.8_p10-r1 should work fine