Portage exports path, hash and mtime of files of type <obj> in CONTENTS files in /var/db/pkg/<category>/<ebuild>. This is very interesting information for checking the integrity of installed files. Online checking integrity of files on linux is typically using IMA. While portage exports hashes in the form of md5sums, IMA, however only measures using sha1, sha256 and sha512. I'd guess most elegant would be to prepend the hash with the algorithm (similar to how IMA exports the hashes) in CONTENTS file. Reproducible: Always
Example (current): obj /usr/include/libguile/vports.h 8b6b85c8fdc0fd5a5db625fc2840cb15 1482137068 Example (new): obj /usr/include/libguile/vports.h md5:8b6b85c8fdc0fd5a5db625fc2840cb15 1482137068 I'm not familiar with what situations this would impact, though. I can image compatibility issues can crop up with this approach.
Considering vdb format is not standard and used by all PMs somehow (and possibly more tools), I'm not sure if it's possible to do that using the same file. Maybe it'd be worth adding additional file for that (yeah, I know that sucks a bit). Quite a stupid idea would be to add CONTENTS.sha512sum and alike files, with format matching 'sha512sum -c'.
Okay, so I looked into this. The contents of CONTENTS ;) is accessed in quite a bunch of different places throughout vartree.py. I couldn't really get the hang of how this all worked together. (E.g. there are functions writeContentsToContentsFile() and write_contents(), but there is some obscure spot in mergeme() which turned out to be the main place where the CONTENTS file gets written to.) Circumventing all of this, I followed Michał's suggestion and added some lines to write a separate file. The patch adds a file INTEGRITY and in it lists mode, hash and filename for all objects. Some additional tidbits: - hashing is dependent on INTEGRITY_HASH being set to sha1, sha256 or sha512 in make.conf - mode, hash and filename are based on what's in SRC (the build dir in PORTAGE_TMPDIR) - for objects of type DIR a check is first done whether the object already exists in DEST, because otherwise we'll get entries for /usr, /usr/bin, etc. - all entries to INTEGRITY are prepended, so we obtain some flexibility for future expansion
Created attachment 460212 [details, diff] add support for INTEGRITY file
Would appreciate feedback on the patch. Happy to modify it to make it suitable for inclusion in portage.
Some possibly related bugs are 134677, 523706 and 230818.
Two quick questions: 1. Is there a reason to record non-regular files there? Considering that they are in CONTENTS anyway, and there's no hash check for that? 2. Is the resulting format compatible with some existing tool to check hashes?
1. Good point. It's not really in scope of this bug. The reason it slipped in is that with this solution we can now do more than was possible in CONTENTS (e.g. modes aren't recorded in CONTENTS). Actually, I'd like to also add ownership information, but didn't come around to this yet. Not sure how you feel about this. (See also bug #230818) 2. I reckon the main tools to consider are AIDE, IMA and sha1sum. AIDE uses a db, so a script is needed to populate the db from this. IMA (in Linux mainline) isn't really a tool, but is very interesting for comparison, and uses the format: <PCR> <template-hash> <format> <file-hash> <filename> 10 b1393e1eec466da807660487196b6e522900ee11 ima-ng sha256:842afd4da148eedf5490313129237bf172761af8631afbd9f44bb0e3f168f592 /usr/lib64/firefox/firefox 10 0375ec32459cfa2cebed91614a5a0dd7f593c283 ima-ng sha256:d2e6a19901b77e2eea8d66ef57ac60bd021e8bfee787a043927530da99a9fc56 /usr/lib64/firefox/plugin-container sha1sum (and siblings) use the format: <file-hash> *<filename> 62fc3e810c7631fbbd45d9c960ec749fc1eb66e5c56039423e3e94a5391a437a *ubuntu-16.04.1-server-i386.img 62fc3e810c7631fbbd45d9c960ec749fc1eb66e5c56039423e3e94a5391a437a *ubuntu-16.04.1-server-i386.iso I'm also interested in amending the q applets to make use of this.
Created attachment 472158 [details, diff] use realpath() Fixed a small issue: IMA exposes filepaths differently than the filepaths to which portage writes, as they may be symlinks. Solved by applying realpath() on destobj.
Don't checksum symlink targets. They may change and it is perfectly valid.
Also, realpath is not valid for absolute symlinks when ROOT != /. Also, we don't expand directory symlinks in CONTENTS paths because then quickpkg results would vary depending on local symlinks.
Created attachment 472168 [details, diff] removed use of realpath() Thanks for the feedback! Removed the use of realpath. Changed how mode is calculated -- simply use oct(mymode)[-4:]
What are the possibilities of this getting adopted? I'm happy to update the diffs or improve things if there are concerns/suggestions. A somewhat complementary q-applet has been adopted, which can make use of this to perform online integrity scans of performed executables, see bug 619988.
For extensibility and backward compatibility, I'd like to add the new hashes in separate files, as discussed in bug 654122, comment #7: > In order to make CONTENTS extensible, without changing the format, we can > introduce separate files that are assumed to contain parallel data just like > parallel arrays. That way, the file paths only need to be stored once, and > we can extend the data associated with those paths by adding new files that > have exactly the same order and number of lines.
Okay, that seems similar to Michal Gorny's remarks in comment 2. Based on this, the patch was changed to create a separate file INTEGRITY, it has both digests and modes. Would you prefer separate files DIGESTS, MODES, (ATTRS?), etc? Also, modes are relevant for e.g. symlinks, while digests are not. Currently, the patch has both digests and modes on one line. Do you have a preference how to deal with this, given the files should be same order and amount of lines?
(In reply to Sam from comment #15) > Okay, that seems similar to Michal Gorny's remarks in comment 2. Based on > this, the patch was changed to create a separate file INTEGRITY, it has both > digests and modes. > > Would you prefer separate files DIGESTS, MODES, (ATTRS?), etc? Yes, the DIGESTS certainly need to be separate because we can expect new digest types to be introduced occasionally. Also we should separate MODES and ATTRS so that it's possible to access them independently. For these files that are parallel to CONTENTS, let's use a CONTENTS_ file name prefix so that they can be easily identified. > Also, modes are relevant for e.g. symlinks, while digests are not. > Currently, the patch has both digests and modes on one line. Do you have a > preference how to deal with this, given the files should be same order and > amount of lines? For anything that's not a regular file obj (CONTENTS supports sym, dir, fif, dev), let's use an empty line for the digest.
Let's use a CONTENTS_DIGESTS_ file name prefix for digests, so that we can easily identify those that contain digests.
Created attachment 534598 [details, diff] Output metadata into separate CONTENTS_ files Here's a patch against the vartree.py of portage-2.3.24-r1. It follows the suggested approach and creates CONTENTS_DIGESTS_<hashtype> and CONTENTS_MODES. Two remarks: - The patch doesn't create empty lines for digests, but pads with the appropriate number of zero's (depending on hash type). That way utils will have a walk in the park when processing the files, as they can fseek to the appropriate position based on a single counter. - The patch doesn't yet create CONTENTS_ATTRS, I'm hoping to be able to tackle that in the coming period. Happy to receive comments/suggestions.
I'm noticing that capabilities in XATTR are often/always added in postinst() using FCAPS.ECLASS. The eclass reference [1] states that this is on purpose "due to probable capability-loss on moving or copying". I don't understand this. PaX-markings are also typically stored in XATTR, but are in src_install(), not postinst(). Apparently, those XATTR get copied over just fine. Also, in util/movefile.py there are plenty of points in the code where XATTR seem to get treated with care. Adding capabilities in postinst() makes what we're doing here very hacky; we'll need similar code as in the patch but in the eclass, instead of relying on dbapi/mergeme(). Has it become time to start moving the capability marking away from postinst(). Or is there something I'm missing? [1] https://devmanual.gentoo.org/eclass-reference/fcaps.eclass/index.html
(In reply to Sam from comment #19) > Has it become time to start moving the capability marking away from > postinst(). Or is there something I'm missing? > > [1] https://devmanual.gentoo.org/eclass-reference/fcaps.eclass/index.html The thing is, portage only handles xattrs from src_install() if FEATURES="xattr" is enabled, and https://dev.gentoo.org/~ulm/pms/head/pms.html mentions nothing about extended attributes. I think a good compromise would be apply the xattrs both in src_install and pkg_postinst, so that all cases are covered.
Hmmm, and that while capabilities are stored in xattr :/ What is needed to move the setting of capabilities to src_install()?
(In reply to Sam from comment #21) > Hmmm, and that while capabilities are stored in xattr :/ fcaps_pkg_postinst will remain as a fallback. > What is needed to move the setting of capabilities to src_install()? The fcaps eclass will need a function for ebuilds to call at the end of src_install, which will apply settings from the FILECAPS variable to the files in ${D}. The src_install function of all relevant ebuilds will have to be updated to call this new function.
I'll give it a try.(In reply to Zac Medico from comment #22) > fcaps_pkg_postinst will remain as a fallback. Yeah. I still think it's odd we don't mind for a fallback when it comes to pax marking. > The fcaps eclass will need a function for ebuilds to call at the end of > src_install, which will apply settings from the FILECAPS variable to the > files in ${D}. The src_install function of all relevant ebuilds will have to > be updated to call this new function. Okay, I'll take a go at it.
Created attachment 534870 [details, diff] Output digests, modes and attr_pax to separate files Here's an updated patch that now also creates CONTENTS_ATTRS_PAX for PaX markings. If you're happy with the patch, I think this can be included in portage. (Including support for recording capabilities will take (me) a little longer.)
Created attachment 535372 [details, diff] Output digests, modes and attr_pax to separate files + fix reading attr Fixed issue where it tried to read pax markings from mydest instead of mysrc.
The dblink._add_preserve_libs_to_contents and vardbapi.removeFromContents methods are used to modify CONTENTS for preserve-libs handling, and the CONTENTS_* files will have to be updated here as well.
(In reply to Zac Medico from comment #26) > The dblink._add_preserve_libs_to_contents and vardbapi.removeFromContents > methods are used to modify CONTENTS for preserve-libs handling, and the > CONTENTS_* files will have to be updated here as well. Will look into that.
Created attachment 542774 [details, diff] Output digests, modes, attr_pax, attr_caps Added code to also record capabilities. AFAIU, capabilities are stored in extended attributes as a 64-bit bitfield. First I simply xattr.getted them and converted to string. Although it seemed to consistently result in 20-char strings, it did so with non-regular characters. That looks crappy when you open the CONTENTS_XATTR_CAPS file. Therefore, I now grab the bitfield out of the struct and record the capabilities as a 16-bit hex string (without the 0x-prefix). To me, that seemed as the more human-readable approach. Not sure what you think of it, though. Happy to receive comments.
Created attachment 543736 [details, diff] Update CONTENTS_* files in _add_preserve_libs_to_contents() and removefromcontents() I think this should more or less work, but needs testing. Also, I still need to look at the fcaps eclass.
Created attachment 544732 [details, diff] Allow fcaps.eclass to set caps in src_install (besides pkg_postinst) Here's a patch for fcaps.eclass. This patch copies the wrapper for fcaps specifically for src_install. Also, if a call to fcaps from pkg_postinst is done and the caps weren't already set, a warning is triggered. This should provide feedback on cases of caps earlier set from within src_install being destroyed during moving/copying.
(In reply to Sam from comment #29) > Created attachment 543736 [details, diff] [details, diff] > Update CONTENTS_* files in _add_preserve_libs_to_contents() and > removefromcontents() > > I think this should more or less work, but needs testing. > > Also, I still need to look at the fcaps eclass. Did some tests and found an issue with how I wanted to use self to find the directory of vdb. Fixed it by passing it more explicitly.
Created attachment 545162 [details, diff] Pass dir of CONTENTS_* files to removeFromContentsMeta() To fix issue with incorrect use of self
Created attachment 545784 [details, diff] In fcaps.eclass, change order: first check if pkg_postinst, then check if caps were set
Created attachment 550998 [details, diff] Change fcaps.eclass: check for FILECAPS and don't provide custom src_install() Bumped into two issues with this eclass when emerging pam: 1) The pam ebuild calls fcaps directly in pkg_postinst and doesn't provide a custom src_install(). So the fcaps variant of src_install() was used which doesn't do what it should. To fix this, fcaps.eclass now defines fcaps_parse() and no longer defines fcaps_src_install(). Ebuild should call fcaps_parse() in their src_install() or pkg_postinst(). Fcaps.eclass still provides fcaps_pkg_postinst() which wraps around fcaps_parse() 2) The pam ebuild doesn't specify $FILECAPS() so fcaps() was borking due to too few args. This was caused by the fcaps_parse() trying to get args from $FILECAPS() and passing them on to fcaps(). To fix this, fcaps_parse() now checks whether $FILECAPS([0]) is set/non-empty. If not, it will skip trying to fire off fcaps() with empty args.
Created attachment 551410 [details, diff] Fix mistake in writeMetaData() definition Took awhile before portage updating ran into a package where preserve-libs was needed. Until now, and it triggered a minor bug, where the function definition of writeMetaData didn't include the receiving of self. Fixed in this new patch. I think that now all code has been touched by emerge. Do you guys want more formal testing before inclusion?
@Zac: do you think this is ready to be posted/reviewed on gentoo-portage-dev@lists.gentoo.org?
For removeFromContents, I'd like it to be possible to rollback interrupted transactions. A procedure like this will allow for easy rollback: 1) Create marker file to indicate that a transaction is in progress. 2) Create replacement metadata files, each with a temporary .new suffix. 3) Create hardlinks of the old metadata files, each with a .old suffix. 4) Rename each of the .new files, replacing the old versions. 5) Use SyncfsProcess to sync new files to disk. 6) Removed files with .old suffix. 7) Remove transaction marker file. If we detect an interrupted transaction (unexpected transaction marker file), we can rollback simply by renaming all of the .old files back.
Rather than use .new and .old file suffixes for files in in the transaction, it's probably better to group the new files and old files into separate directories. We could even use a directory as a transaction marker, for example new files could go in transaction/new/ and old files in transaction/old/.
I'd like to throw mtree(5) format into the list of possibility. On the plus side, it's somewhat BSD-standard and it's used by Arch Linux's package format. On the minus side, it doesn't seem to allow for more custom hashes. As for hashes themselves, there's also the possibility of reusing Manifest format. Finally, we might use both formats for two different purposes.
I suspect both mtree and Manifest would be slower. The proposed format has fixed length fields, making look-ups pretty fast.
This assumption sounds like premature optimization. Don't sacrifice transparency or readability for negligible speed gain.