432024 – sys-apps/portage-2.1.10.65 uses more cpu than it should (due to PORT_LOGDIR setting?)

Bug 432024 - sys-apps/portage-2.1.10.65 uses more cpu than it should (due to PORT_LOGDIR setting?)

Summary: sys-apps/portage-2.1.10.65 uses more cpu than it should (due to PORT_LOGDIR s...

Status:	RESOLVED FIXED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Core (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Portage team

URL:
Whiteboard:
Keywords:	InVCS

Depends on:
Blocks:	431026
	Show dependency tree

Reported:	2012-08-20 03:55 UTC by SpanKY
Modified:	2012-09-26 15:59 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description SpanKY gentoo-dev

2012-08-20 03:55:51 UTC

on an embedded superh box, running `emerge gcc -1u` shows the main emerge python process constantly using cpu.  this seems to be due to constant polling of the output ?

for example, `ps aux` shows:
...
root     16561 39.4  1.5  48512   936 pts/4    S+   Aug16 2240:51 /usr/bin/python2.6 /usr/bin/emerge sys-devel/gcc -1u
...
root      7976 66.9 66.8  75660 40752 pts/4    D+   23:42   0:52 /usr/sh4-unknown-linux-gnu/bin/ld --eh-frame-hdr -m shlelf_linux -export-dynamic -dynamic-linker /lib/ld-linux.so.2 -o cc1 /usr/lib/crt1.o
...

there's really no reason that emerge process should be sucking up so much time.  running strace against it, i see:
gettimeofday({1345434286, 75025}, NULL) = 0
poll([{fd=9, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}, {fd=7, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}], 2, 3000) = 0 (Timeout)
gettimeofday({1345434289, 89291}, NULL) = 0
gettimeofday({1345434289, 90747}, NULL) = 0
poll([{fd=9, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}, {fd=7, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}], 2, 3000) = 1 ([{fd=9, revents=POLLIN}])
read(9, "/var/tmp/portage/sys-devel/gcc-4"..., 4096) = 1135
write(13, "/var/tmp/portage/sys-devel/gcc-4"..., 1135) = 1135
write(12, "/var/tmp/portage/sys-devel/gcc-4"..., 1135) = 1135
read(9, 0x19a5784, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1345434293, 93010}, NULL) = 0
gettimeofday({1345434293, 96096}, NULL) = 0
poll([{fd=9, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}, {fd=7, events=POLLIN|POLLERR|POLLHUP|POLLNVAL}], 2, 3000) = 0 (Timeout)
<lots more gettimeofday/poll calls>

looking at the fds of that process, i have:
/proc/16561/fd/
  12 -> /var/log/portage/sys-devel:gcc-4.5.3-r2:20120816-065718.log
  13 -> /dev/pts/4
   2 -> /dev/pts/4
   3 -> pipe:[1289435]
   4 -> pipe:[1289435]
   5 -> /var/tmp/portage/sys-devel/.gcc-4.5.3-r2.portage_lockfile
   7 -> /var/tmp/portage/sys-devel/gcc-4.5.3-r2/.ipc_in
   9 -> /dev/ptmx

so it's polling fd 9 constantly, and whenever it gets output, it writes it to the tty (13) and the log file (12).  then restarts the whole process.

shouldn't this use a blocking call ?  why are we polling with the poll syscall ?

Comment 1 Zac Medico gentoo-dev

2012-08-20 05:11:46 UTC

(In reply to comment #0)
> shouldn't this use a blocking call ?

It does block until it hits the 3 second timeout, which shows as 3000 ms in your strace log. The timeout is for it to update the load avg display.

> why are we polling with the poll syscall ?

We could use epoll instead, since python has bindings for that too.

Comment 2 SpanKY gentoo-dev

2012-08-20 17:49:08 UTC

(In reply to comment #1)

there is no load avg display (i'm not using --jobs or --quiet-build).  you can see that there aren't any syscalls made other than the poll/gettimeofday, which means portage isn't querying for loadavg stats here.

maybe the timeout logic should be checking to see if there's any load display to be updating in the first place ?

Comment 3 Zac Medico gentoo-dev

2012-08-20 20:39:52 UTC

(In reply to comment #2)
> maybe the timeout logic should be checking to see if there's any load
> display to be updating in the first place ?

Yeah, this should do it:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=31b41fe7a20e6e0d19ff2383a746d61d8aa18b50

Comment 4 Zac Medico gentoo-dev

2012-08-22 07:18:40 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > why are we polling with the poll syscall ?
> 
> We could use epoll instead, since python has bindings for that too.

This is implemented now:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=b986bcdd49c5523ffe6972377071d556a819c776

Comment 5 Zac Medico gentoo-dev

2012-08-22 23:45:35 UTC

This is fixed in 2.1.11.11 and 2.2.0_alpha122.

Comment 6 Zac Medico gentoo-dev

2012-09-26 15:59:41 UTC

In portage-2.1.11.22 and 2.2.0_alpha133 there's a fix for a related, and much worse bug:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2b2580d9dac62aa720e5d996fa5102ee5caeffe7