553574 – app-portage/eix does not atomically replace databases, causing concurrent read failures

Bug 553574 - app-portage/eix does not atomically replace databases, causing concurrent read failures

Summary: app-portage/eix does not atomically replace databases, causing concurrent rea...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Development (show other bugs)
Hardware:	All Linux

Importance:	High critical (vote)
Assignee:	Martin Väth

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-29 16:58 UTC by Robin Johnson
Modified:	2015-11-08 10:24 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Robin Johnson archtester

2015-06-29 16:58:21 UTC

eix-update generates new database files, and then opens the existing path with truncate to replace.

If another process attempts to read the database while the write is happening, it will have weird errors.

If the write fails (eg out of disk space), the on-disk database will also remain corrupted.

Please fix this by writing new databases to a separate filename in the same directory (one convention is appending .$PID on the original filename), and moving it over the new file if the write was successful.

This manages to bite Gentoo infra's puppet repos a few times a week, because it relies heavily on eix for lookups, and portage is concurrently updated in a cronjob very often.

Comment 1 Martin Väth 2015-06-29 18:48:29 UTC

(In reply to Robin Johnson from comment #0)
> eix-update [...] opens the existing path with truncate to replace

This is intentional: This behaviour ensures that the permissions given by the user to the database (including xattr, acl, or whatever) are preserved and that there are no problems if eix-update has no permissions to write into the corresponding directory.

> Please fix this by writing new databases to a separate filename
> in the same directory (one convention is appending .$PID on the original
> filename), and moving it over the new file if the write was successful.

Depending on the directory in which EIX_CACHEFILE resides, this might be possibly a security risk (generating a predictable filename in a directory which possibly is writable by many people).
If you want this behaviour, you can simply call a wrapper script instead of eix-update (while the converse could not easily be obtained by a wrapper script if eix-update would behave differently), for instance as follows (untested):

#!/bin/sh
. eix-functions.sh
ReadFunctions
ReadVar EIX_CACHEFILE
if EIX_CACHEFILE=$EIX_CACHEFILE.$$ /usr/bin/eix-update "$@"
then mv -- "$EIX_CACHEFILE.$$" "$EIX_CACHEFILE" || \
       die "failed to move $EIX_CACHEFILE.$$"
else rm -f -- "$EIX_CACHEFILE.$$" || \
       die "failed to remove $EIX_CACHEFILE.$$"
     die "failed: eix-update $*"
fi

Comment 2 Martin Väth 2015-06-29 19:02:58 UTC

(In reply to Martin Väth from comment #1)
> ReadVar EIX_CACHEFILE

This should be

ReadVar EIX_CACHEFILE EIX_CACHEFILE /usr/bin/eix-update

(the second argument was simply forgotten, and by giving the third an endless loop is avoided if the wrapper script is e.g. /usr/local/bin/eix-update).
However, I repeat: The script is untested.

Comment 3 Martin Väth 2015-06-29 19:31:27 UTC

...and yet another small issue with the script I just realized: "die" would also call ReadVar, but without a third argument. Thus, one should set

read_var_prg=/usr/bin/eix-update

near the beginning of the script to use the full path throughout.

Comment 4 Robin Johnson archtester

2015-07-01 18:21:09 UTC

Iif you don't intend on atomically replacing, then can you please use flock(2) on the database, to prevent a concurrent open for reading while the process is busy writing the new content?

When opening for reading, do:
if(flock(fd, LOCK_SH) != 0) {
  // handle err
}
...
flock(fd, LOCK_UN);

When opening for writing, do:
if(flock(fd, LOCK_EX) != 0) {
  // handle err
}
...
flock(fd, LOCK_UN);

Crashes will also automatically remove the lock, or closing the descriptors. flock will block as needed to prevent a read/write conflict. If you want non-blocking, or it with LOCK_NB, and check EWOULDBLOCK

Comment 5 Martin Väth 2015-07-01 23:43:48 UTC

Good idea.

This is fixed in eix github master branch (>=eix-0.30.12)-