Bisecting: 16 revisions left to test after this (roughly 4 steps) [36f1ce1fa2ee9e2d2608ca2629b8b1232b14a1dc] x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE Bisecting: 7 revisions left to test after this (roughly 3 steps) [a671258da2cdb15fbb60dd0f22d13418ae4e76b2] ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC Bisecting: 3 revisions left to test after this (roughly 2 steps) [3217df8e225c8579293fd2e193ba5cee709b6eba] ide-disk: Fix request requeuing Bisecting: 1 revision left to test after this (roughly 1 step) [d694ac34a8565cc3639bcb766055e92c043c4291] lis3: fix regression of HP DriveGuard with 8bit chip Bisecting: 0 revisions left to test after this (roughly 0 steps) [249cf808ba1a0d403fe7c476a74b66e2bc0a8e53] posix-cpu-timers: Cure SMP wobbles 249cf808ba1a0d403fe7c476a74b66e2bc0a8e53 is the first bad commit commit 249cf808ba1a0d403fe7c476a74b66e2bc0a8e53 Author: Peter Zijlstra Date: Thu Sep 1 12:42:04 2011 +0200 posix-cpu-timers: Cure SMP wobbles commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 upstream. David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include #include #include #include #include #include #include #include static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman :040000 040000 e9d2a2725b5a46cea3fa6ce0ee35dcf090009d99 84f6942a1e2a17df5493bfa83964e895909e9ebe M include :040000 040000 4f7721c10ef95af6885ee555babf6700dc90c821 bc907b8e0c592cd3dd046686c132283971e4d9bc M kernel