Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 41799

Summary: Kernel 2.6 loosing time in VMWare GSX enviornment
Product: Gentoo Linux Reporter: Nick Ellson <grimm>
Component: [OLD] Core systemAssignee: x86-kernel (DEPRECATED) <x86-kernel>
Status: RESOLVED CANTFIX    
Severity: normal CC: steel300
Priority: High    
Version: unspecified   
Hardware: x86   
OS: All   
Whiteboard:
Package list:
Runtime testing required: ---

Description Nick Ellson 2004-02-16 09:00:23 UTC
I have loaded Gentoo from the Pentium 4 Stage 3 tarball, then used the 2.6 kernel to build my kernel in a VMWare instance on a 3Ghz Xeon server. One thing I noticed right off the bat is that I am loosing 300-400seconds of time every hour (as shown when I cron ntpdate against a good time source)

The VMWare server is he VMWare GSX running on Windows 2000 Server. The other Windows based hosts seem fine with thier time clocks. 

My VMWare host is emulating SCSI drives (I saw an arrtical in my search that said perhaps poor DMA settings on IDE drives can cause this) 

 
 

Reproducible: Always
Steps to Reproduce:
1. Load Gentoo on a VMWare Host, where the server is a 3Ghz Pentium 4 Xeon, use 128 Megs memory (though that doesn't seem to matter)
2. Emerge the gentoo-dev-sources (2.6 kernel)
3. Pool NTP time from a know good source, observe the offset each hour. 

Actual Results:  
This was the BEST time offset today, usually in the 300-400 range.

16 Feb 08:03:06 ntpdate[7696]: step time server 172.21.1.254 offset 184.597530 
sec



Expected Results:  
No more than a few milliseconds in an hours time I would expect. Maybe 5-10 
mins in a month is typical with a bad clock.

Portage 2.0.50 (default-x86-1.4, gcc-3.3.2, glibc-2.3.2-r9, 2.6.1-gentoo-r1)
=================================================================
System uname: 2.6.1-gentoo-r1 i686 Intel(R) Xeon(TM) CPU 3.06GHz
Gentoo Base System version 1.4.3.13
Autoconf: sys-devel/autoconf-2.59
Automake: sys-devel/automake-1.8.2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-O3 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/
config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
CXXFLAGS="-O3 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://212.219.247.12/sites/www.ibiblio.org/gentoo/ 
http://ftp.heanet.ie/pub/gentoo/ 
http://212.219.247.18/sites/www.ibiblio.org/gentoo/ http://gentoo.ccccom.com"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="apache2 apm avi berkdb crypt cups encode foomaticdb gdbm gif gpm gtk gtk2 
imagemagick imlib java jpeg libg++ libwww mad mbox mikmod motif mpeg ncurses 
nls nptl oggvorbis opengl openssh pam pdflib perl png python qt quicktime 
readline sdl slang spell ssl svga tcpd truetype x86 xml2 xv zlib"
Comment 1 Nick Ellson 2004-02-16 17:21:41 UTC
I just recently tried the Enhanced Real Time Clock support under character mode devices, that did not make any difference. I still see it trying to use the TSC timesource in DMESG's and a few 10's of lines later it complains that it's "loosing too many ticks!"

I have not been able to reproduce this on any of my Athlon systems, so far just this Pentium 4 Xeon VMWare session under the VMWare GSX Server.



Nick
Comment 2 Nick Ellson 2004-02-17 12:42:22 UTC
I have been scanning VMWare's support site: This looks like a problem with the way VMWare deals with the TSC clock source. There are a few possible fixes to tell the VMWare host to fix it's clock timing to a static value, and to allow the internal guest OS to read that value. Together they can bring the clock sync close to "on time" though it would appear that some shift is enevitable. 

Can someone who knows timing and the TSC clock source identify if this is actually the case and if it cannot be addressed by Gentoo (must be fixed by WMWare?) :)

A post from the VMWare forum:
---------
That's the problem: your CPU is sometimes running faster than VMware Workstation expects it to be running. See Knowledge Base article 709 for one workaround.

http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=709

There is also a workaround that doesn't require you to lock the CPU speed to a constant value: put the correct maximum CPU speed in your global config file. On Linux hosts this file is /etc/vmware/config. On Windows hosts, it is normally C:\Documents and Settings\All Users\Application Data\VMware\VMware Workstation\config.ini. In both cases, the file may not exist; if so, create it. Be sure to create it as a plain text file. On Windows, you can use Notepad, but be careful when saving the file that Notepad does not add an extra .txt extension to the filename.

If your machine is 1700 MHz maximum speed, the lines to add to this file are as follows. The first line is the most important one.

host.cpukHz = 1700000
hostinfo.noTSC = TRUE
tools.syncTime = TRUE 
Comment 3 Chris Gianelloni (RETIRED) gentoo-dev 2004-02-20 12:24:54 UTC
This has nothing to do with the VMWare-Workstation package in portage.  This deals with the 2.6 kernel (gentoo-dev-sources) running from within a VMWare GSX session running on Windows 2000.

I wouldn't even know where to begin with such a thing.
Comment 4 Tim Yamin (RETIRED) gentoo-dev 2004-02-23 10:28:46 UTC
Hmm, for a start are you able to confirm that the kernel or Gentoo is the problem? Since VMWare controls the RTC and the normal timer, the kernels use what it feeds to them...
Comment 5 Nick Ellson 2004-02-24 12:29:15 UTC
I have now tested this exact situation with a 2.4 R7 gentoo kernel. It behaves in the same manner. VMWare now says that they have not tested ALL Linux variants, and that I will now be required to use one of the ones they support, instead of Gentoo.

What might my options be? Are there any Gentoo developers that have the time to work "through" me to test anything?

I started a shell script that runs NTPDATE -B <ip> then sleeps for 2 secs, and repeats. I see the following slips... Most of the time I am falling behind by small fractions (still too big for my tastes), but maybe 4 times in a minute, I stall and loose big time (no pun intended ;)


 
24 Feb 12:24:16 ntpdate[3361]: step time server 172.21.1.254 offset 0.061885 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:24 ntpdate[3365]: step time server 172.21.1.254 offset 1.036816 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:33 ntpdate[3398]: step time server 172.21.1.254 offset 3.318382 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:39 ntpdate[3466]: step time server 172.21.1.254 offset 1.360879 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:43 ntpdate[3514]: step time server 172.21.1.254 offset 0.413224 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:48 ntpdate[3534]: step time server 172.21.1.254 offset 0.480124 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:55 ntpdate[3555]: step time server 172.21.1.254 offset 0.781024 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:24:59 ntpdate[3596]: step time server 172.21.1.254 offset 0.708382 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:06 ntpdate[3617]: step time server 172.21.1.254 offset 1.020529 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:14 ntpdate[3654]: step time server 172.21.1.254 offset 3.538078 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:30 ntpdate[3904]: step time server 172.21.1.254 offset 12.289117 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:42 ntpdate[4153]: step time server 172.21.1.254 offset 9.386439 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:49 ntpdate[4219]: step time server 172.21.1.254 offset 1.337804 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:55 ntpdate[4223]: step time server 172.21.1.254 offset 0.076768 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:25:59 ntpdate[4227]: step time server 172.21.1.254 offset 0.073788 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:04 ntpdate[4231]: step time server 172.21.1.254 offset 0.069289 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:08 ntpdate[4235]: step time server 172.21.1.254 offset 0.191404 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:13 ntpdate[4239]: step time server 172.21.1.254 offset 0.617576 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:20 ntpdate[4243]: step time server 172.21.1.254 offset 0.791185 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:25 ntpdate[4247]: step time server 172.21.1.254 offset 0.632337 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:30 ntpdate[4251]: step time server 172.21.1.254 offset 0.226999 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:37 ntpdate[4255]: step time server 172.21.1.254 offset 0.183517 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:44 ntpdate[4259]: step time server 172.21.1.254 offset 0.120676 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:50 ntpdate[4263]: step time server 172.21.1.254 offset 0.128539 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:26:57 ntpdate[4267]: step time server 172.21.1.254 offset 0.078639 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:27:01 ntpdate[4271]: step time server 172.21.1.254 offset 0.066870 sec
Looking for host 172.21.1.254 and service ntp
host found : 172.21.1.254
24 Feb 12:27:08 ntpdate[4275]: step time server 17
Comment 6 Tim Yamin (RETIRED) gentoo-dev 2004-02-24 12:51:28 UTC
For a start, could you please point to us to what they consider to be approved? Since this probably involves some form of a patch for the kernel as that's what seems to be lagging the time, we may be able to find out what we need from the SRPMS, for example, and add those patches to gentoo-sources.

Thanks!
Comment 7 Nick Ellson 2004-02-24 13:28:58 UTC
The supported Guests are:

http://www.vmware.com/support/gsx25/doc/intro_sysreqs_guest_gsx.html#1001621

Linux 

Mandrake Linux 8.0, 8.1, 8.2, 9.0 and 9.1

Red Hat Linux 6.2, 7.0, 7.1, 7.2, 7.3, 8.0, 9.0 and Red Hat Enterprise Linux (AS, ES, WS) 2.1

SuSE Linux 6.x, 7.0, 7.1, 7.2, 7.3, 8.0, 8.1, 8.2 and SLES 7, 8

Turbolinux 6.0 and 7.0 
Comment 8 Nick Ellson 2004-02-24 13:32:01 UTC
They also mention that it is not recommended to run the Guest without the VMWARE tools daemon. This supplies guest OS clock sync to the host (but only polls every minute.. and by them my logs are all messed up, and MRTG get's lost.

Nick

(Would it help if I uploaded my .config file for my Kernel?)

Comment 9 Tim Yamin (RETIRED) gentoo-dev 2004-03-07 06:02:17 UTC
Can you try turning CONFIG_X86_TSC and CONFIG_X86_HAS_TSC off? I'm not sure whether it has an option in the kernel configurator so you might need to manually edit your .config - it might hang your system as well so you might want to backup your kernel first.
Comment 10 Tim Yamin (RETIRED) gentoo-dev 2004-03-07 15:19:19 UTC
Hmmm, this be very related to bug 42904: if you try the patch in there does it solve the issue?
Comment 11 Jason Cox (RETIRED) gentoo-dev 2004-04-16 17:28:32 UTC
User has moved on. No more response in a few months.