Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 602986

Summary: Sort portage /var/lib/portage/config file before writing to disc.
Product: Portage Development Reporter: pitcha
Component: Enhancement/Feature RequestsAssignee: Portage team <dev-portage>
Status: UNCONFIRMED ---    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Sort dictionary before writing to disc

Description pitcha 2016-12-18 08:50:23 UTC
Created attachment 456552 [details, diff]
Sort dictionary before writing to disc

I have my configuration files under version control. Among the tracked files is portage's /var/lib/portage/config (if that's a good idea to track is a different story, but currently I do).
Over the time I noticed that the file is changed by portage in a non deterministic way, e.g. emerging openssl and then reemering it yielded different /var/lib/portage/config files.
The reason is that the file /var/lib/portage/config is just a python dictionary written to disc and dictionaries are unordered (pym/portage/util/__init__.py, line 577)

def writedict(mydict, myfilename, writekey=True):
        """Writes out a dict to a file; writekey=0 mode doesn't write out
        the key and assumes all values are strings, not lists.""" 
        lines = []
        if not writekey:
                for v in mydict.values():
                        lines.append(v + "\n")
        else:
                for k, v in mydict.items():
                        lines.append("%s %s\n" % (k, " ".join(v)))
        write_atomic(myfilename, "".join(lines))

I propose to change the implementation to write out the dictionary in alphabetical order (of the key if writekey=True, or the value otherwise).
This makes the implementation more deterministic over different runs.

This is a small change, just inserting sorted() in each of the two for-loops (patch also attached), e.g.

        [...]
        if not writekey:
                for v in sorted(mydict.values()):
                        lines.append(v + "\n")
        else:
                for k, v in sorted(mydict.items()):
                        lines.append("%s %s\n" % (k, " ".join(v)))
        [...]
Comment 1 Zac Medico gentoo-dev 2016-12-18 10:25:35 UTC
Sorting this file is a waste of cpu time, and really we should take every opportunity that we can to conserve resources.

I think we can eliminate /var/lib/portage/config entirely, and instead rely upon the md5 digest of the config file installed by the previous instance the package (the md5 digests are available in /var/db/pkg/*/*/CONTENTS).
Comment 2 Zac Medico gentoo-dev 2019-09-02 18:13:36 UTC
(In reply to Zac Medico from comment #1)
> I think we can eliminate /var/lib/portage/config entirely, and instead rely
> upon the md5 digest of the config file installed by the previous instance
> the package (the md5 digests are available in /var/db/pkg/*/*/CONTENTS).

Since file collisions are allowed for files under CONFIG_PROTECT, it's possible for multiple packages to "own" a particular config file, so /var/lib/portage/config currently provides disambiguation in this case.