Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 605082 - Add metadata to CONTENTS_ files in vdb
Summary: Add metadata to CONTENTS_ files in vdb
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: PATCH
Depends on: 671864
Blocks: 193766
  Show dependency tree
 
Reported: 2017-01-08 17:28 UTC by Sam
Modified: 2022-11-16 18:03 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
add support for INTEGRITY file (integrity.patch,3.88 KB, patch)
2017-01-15 18:34 UTC, Sam
Details | Diff
use realpath() (integrity-0.2.patch,3.74 KB, patch)
2017-05-09 19:56 UTC, Sam
Details | Diff
removed use of realpath() (integrity-0.3.patch,3.73 KB, patch)
2017-05-09 20:40 UTC, Sam
Details | Diff
Output metadata into separate CONTENTS_ files (integrity.patch,4.50 KB, patch)
2018-06-02 22:50 UTC, Sam
Details | Diff
Output digests, modes and attr_pax to separate files (integrity.patch,5.41 KB, patch)
2018-06-04 21:30 UTC, Sam
Details | Diff
Output digests, modes and attr_pax to separate files + fix reading attr (integrity.patch,5.41 KB, patch)
2018-06-09 13:14 UTC, Sam
Details | Diff
Output digests, modes, attr_pax, attr_caps (integrity.patch,6.96 KB, patch)
2018-08-08 13:06 UTC, Sam
Details | Diff
Update CONTENTS_* files in _add_preserve_libs_to_contents() and removefromcontents() (integrity.patch,12.52 KB, patch)
2018-08-16 20:21 UTC, Sam
Details | Diff
Allow fcaps.eclass to set caps in src_install (besides pkg_postinst) (fcaps.patch,3.25 KB, patch)
2018-08-23 22:00 UTC, Sam
Details | Diff
Pass dir of CONTENTS_* files to removeFromContentsMeta() (integrity.patch,12.58 KB, patch)
2018-08-26 20:56 UTC, Sam
Details | Diff
In fcaps.eclass, change order: first check if pkg_postinst, then check if caps were set (fcaps.patch,3.71 KB, patch)
2018-09-02 20:14 UTC, Sam
Details | Diff
Change fcaps.eclass: check for FILECAPS and don't provide custom src_install() (fcaps.patch,4.13 KB, patch)
2018-10-14 08:25 UTC, Sam
Details | Diff
Fix mistake in writeMetaData() definition (integrity.patch,12.59 KB, patch)
2018-10-15 20:34 UTC, Sam
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam 2017-01-08 17:28:50 UTC
Portage exports path, hash and mtime of files of type <obj> in CONTENTS files in /var/db/pkg/<category>/<ebuild>. This is very interesting information for checking the integrity of installed files. Online checking integrity of files on linux is typically using IMA. While portage exports hashes in the form of md5sums, IMA, however only measures using sha1, sha256 and sha512.

I'd guess most elegant would be to prepend the hash with the algorithm (similar to how IMA exports the hashes) in CONTENTS file.

Reproducible: Always
Comment 1 Sam 2017-01-08 17:30:46 UTC
Example (current):
obj /usr/include/libguile/vports.h 8b6b85c8fdc0fd5a5db625fc2840cb15 1482137068

Example (new):
obj /usr/include/libguile/vports.h md5:8b6b85c8fdc0fd5a5db625fc2840cb15 1482137068

I'm not familiar with what situations this would impact, though. I can image compatibility issues can crop up with this approach.
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-01-08 20:01:16 UTC
Considering vdb format is not standard and used by all PMs somehow (and possibly more tools), I'm not sure if it's possible to do that using the same file. Maybe it'd be worth adding additional file for that (yeah, I know that sucks a bit).

Quite a stupid idea would be to add CONTENTS.sha512sum and alike files, with format matching 'sha512sum -c'.
Comment 3 Sam 2017-01-15 18:34:01 UTC
Okay, so I looked into this. The contents of CONTENTS ;) is accessed in quite a bunch of different places throughout vartree.py. I couldn't really get the hang of how this all worked together. (E.g. there are functions writeContentsToContentsFile() and write_contents(), but there is some obscure spot in mergeme() which turned out to be the main place where the CONTENTS file gets written to.) Circumventing all of this, I followed Michał's suggestion and added some lines to write a separate file.

The patch adds a file INTEGRITY and in it lists mode, hash and filename for all objects.
Some additional tidbits:
- hashing is dependent on INTEGRITY_HASH being set to sha1, sha256 or sha512 in make.conf
- mode, hash and filename are based on what's in SRC (the build dir in PORTAGE_TMPDIR)
- for objects of type DIR a check is first done whether the object already exists in DEST, because otherwise we'll get entries for /usr, /usr/bin, etc.
- all entries to INTEGRITY are prepended, so we obtain some flexibility for future expansion
Comment 4 Sam 2017-01-15 18:34:30 UTC
Created attachment 460212 [details, diff]
add support for INTEGRITY file
Comment 5 Sam 2017-01-15 18:37:07 UTC
Would appreciate feedback on the patch. Happy to modify it to make it suitable for inclusion in portage.
Comment 6 Sam 2017-01-15 18:38:54 UTC
Some possibly related bugs are 134677, 523706 and 230818.
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-01-15 19:53:22 UTC
Two quick questions:

1. Is there a reason to record non-regular files there? Considering that they are in CONTENTS anyway, and there's no hash check for that?

2. Is the resulting format compatible with some existing tool to check hashes?
Comment 8 Sam 2017-01-15 20:35:47 UTC
1. Good point. It's not really in scope of this bug. The reason it slipped in is that with this solution we can now do more than was possible in CONTENTS (e.g. modes aren't recorded in CONTENTS). Actually, I'd like to also add ownership information, but didn't come around to this yet. Not sure how you feel about this. (See also bug #230818)

2. I reckon the main tools to consider are AIDE, IMA and sha1sum. AIDE uses a db, so a script is needed to populate the db from this. IMA (in Linux mainline) isn't really a tool, but is very interesting for comparison, and uses the format:
<PCR> <template-hash> <format> <file-hash> <filename>
10 b1393e1eec466da807660487196b6e522900ee11 ima-ng sha256:842afd4da148eedf5490313129237bf172761af8631afbd9f44bb0e3f168f592 /usr/lib64/firefox/firefox
10 0375ec32459cfa2cebed91614a5a0dd7f593c283 ima-ng sha256:d2e6a19901b77e2eea8d66ef57ac60bd021e8bfee787a043927530da99a9fc56 /usr/lib64/firefox/plugin-container

sha1sum (and siblings) use the format:
<file-hash> *<filename>
62fc3e810c7631fbbd45d9c960ec749fc1eb66e5c56039423e3e94a5391a437a *ubuntu-16.04.1-server-i386.img
62fc3e810c7631fbbd45d9c960ec749fc1eb66e5c56039423e3e94a5391a437a *ubuntu-16.04.1-server-i386.iso

I'm also interested in amending the q applets to make use of this.
Comment 9 Sam 2017-05-09 19:56:10 UTC
Created attachment 472158 [details, diff]
use realpath()

Fixed a small issue: IMA exposes filepaths differently than the filepaths to which portage writes, as they may be symlinks. Solved by applying realpath() on destobj.
Comment 10 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-05-09 20:13:44 UTC
Don't checksum symlink targets. They may change and it is perfectly valid.
Comment 11 Zac Medico gentoo-dev 2017-05-09 20:19:45 UTC
Also, realpath is not valid for absolute symlinks when ROOT != /. Also, we don't expand directory symlinks in CONTENTS paths because then quickpkg results would vary depending on local symlinks.
Comment 12 Sam 2017-05-09 20:40:05 UTC
Created attachment 472168 [details, diff]
removed use of realpath()

Thanks for the feedback!

Removed the use of realpath.
Changed how mode is calculated -- simply use oct(mymode)[-4:]
Comment 13 Sam 2018-05-26 20:08:19 UTC
What are the possibilities of this getting adopted?
I'm happy to update the diffs or improve things if there are concerns/suggestions.

A somewhat complementary q-applet has been adopted, which can make use of this to perform online integrity scans of performed executables, see bug 619988.
Comment 14 Zac Medico gentoo-dev 2018-05-26 21:09:27 UTC
For extensibility and backward compatibility, I'd like to add the new hashes in separate files, as discussed in bug 654122, comment #7:

> In order to make CONTENTS extensible, without changing the format, we can
> introduce separate files that are assumed to contain parallel data just like
> parallel arrays. That way, the file paths only need to be stored once, and
> we can extend the data associated with those paths by adding new files that
> have exactly the same order and number of lines.
Comment 15 Sam 2018-05-27 19:14:37 UTC
Okay, that seems similar to Michal Gorny's remarks in comment 2. Based on this, the patch was changed to create a separate file INTEGRITY, it has both digests and modes.

Would you prefer separate files DIGESTS, MODES, (ATTRS?), etc?

Also, modes are relevant for e.g. symlinks, while digests are not. Currently, the patch has both digests and modes on one line. Do you have a preference how to deal with this, given the files should be same order and amount of lines?
Comment 16 Zac Medico gentoo-dev 2018-05-27 21:23:42 UTC
(In reply to Sam from comment #15)
> Okay, that seems similar to Michal Gorny's remarks in comment 2. Based on
> this, the patch was changed to create a separate file INTEGRITY, it has both
> digests and modes.
> 
> Would you prefer separate files DIGESTS, MODES, (ATTRS?), etc?

Yes, the DIGESTS certainly need to be separate because we can expect new digest types to be introduced occasionally. Also we should separate MODES and ATTRS so that it's possible to access them independently. For these files that are parallel to CONTENTS, let's use a CONTENTS_ file name prefix so that they can be easily identified.

> Also, modes are relevant for e.g. symlinks, while digests are not.
> Currently, the patch has both digests and modes on one line. Do you have a
> preference how to deal with this, given the files should be same order and
> amount of lines?

For anything that's not a regular file obj (CONTENTS supports sym, dir, fif, dev), let's use an empty line for the digest.
Comment 17 Zac Medico gentoo-dev 2018-05-28 01:48:08 UTC
Let's use a CONTENTS_DIGESTS_ file name prefix for digests, so that we can easily identify those that contain digests.
Comment 18 Sam 2018-06-02 22:50:54 UTC
Created attachment 534598 [details, diff]
Output metadata into separate CONTENTS_ files

Here's a patch against the vartree.py of portage-2.3.24-r1. It follows the suggested approach and creates CONTENTS_DIGESTS_<hashtype> and CONTENTS_MODES. Two remarks:
- The patch doesn't create empty lines for digests, but pads with the appropriate number of zero's (depending on hash type). That way utils will have a walk in the park when processing the files, as they can fseek to the appropriate position based on a single counter.
- The patch doesn't yet create CONTENTS_ATTRS, I'm hoping to be able to tackle that in the coming period.

Happy to receive comments/suggestions.
Comment 19 Sam 2018-06-03 21:49:22 UTC
I'm noticing that capabilities in XATTR are often/always added in postinst() using FCAPS.ECLASS. The eclass reference [1] states that this is on purpose "due to probable capability-loss on moving or copying".

I don't understand this. PaX-markings are also typically stored in XATTR, but are in src_install(), not postinst(). Apparently, those XATTR get copied over just fine. Also, in util/movefile.py there are plenty of points in the code where XATTR seem to get treated with care.

Adding capabilities in postinst() makes what we're doing here very hacky; we'll need similar code as in the patch but in the eclass, instead of relying on dbapi/mergeme().

Has it become time to start moving the capability marking away from postinst(). Or is there something I'm missing?

[1] https://devmanual.gentoo.org/eclass-reference/fcaps.eclass/index.html
Comment 20 Zac Medico gentoo-dev 2018-06-03 22:04:27 UTC
(In reply to Sam from comment #19)
> Has it become time to start moving the capability marking away from
> postinst(). Or is there something I'm missing?
> 
> [1] https://devmanual.gentoo.org/eclass-reference/fcaps.eclass/index.html

The thing is, portage only handles xattrs from src_install() if FEATURES="xattr" is enabled, and https://dev.gentoo.org/~ulm/pms/head/pms.html mentions nothing about extended attributes.

I think a good compromise would be apply the xattrs both in src_install and pkg_postinst, so that all cases are covered.
Comment 21 Sam 2018-06-03 22:47:35 UTC
Hmmm, and that while capabilities are stored in xattr :/

What is needed to move the setting of capabilities to src_install()?
Comment 22 Zac Medico gentoo-dev 2018-06-03 23:00:29 UTC
(In reply to Sam from comment #21)
> Hmmm, and that while capabilities are stored in xattr :/

fcaps_pkg_postinst will remain as a fallback.

> What is needed to move the setting of capabilities to src_install()?

The fcaps eclass will need a function for ebuilds to call at the end of src_install, which will apply settings from the FILECAPS variable to the files in ${D}. The src_install function of all relevant ebuilds will have to be updated to call this new function.
Comment 23 Sam 2018-06-04 21:05:26 UTC
I'll give it a try.(In reply to Zac Medico from comment #22)
> fcaps_pkg_postinst will remain as a fallback.
Yeah. I still think it's odd we don't mind for a fallback when it comes to pax marking.

> The fcaps eclass will need a function for ebuilds to call at the end of
> src_install, which will apply settings from the FILECAPS variable to the
> files in ${D}. The src_install function of all relevant ebuilds will have to
> be updated to call this new function.
Okay, I'll take a go at it.
Comment 24 Sam 2018-06-04 21:30:51 UTC
Created attachment 534870 [details, diff]
Output digests, modes and attr_pax to separate files

Here's an updated patch that now also creates CONTENTS_ATTRS_PAX for PaX markings.
If you're happy with the patch, I think this can be included in portage.
(Including support for recording capabilities will take (me) a little longer.)
Comment 25 Sam 2018-06-09 13:14:57 UTC
Created attachment 535372 [details, diff]
Output digests, modes and attr_pax to separate files + fix reading attr

Fixed issue where it tried to read pax markings from mydest instead of mysrc.
Comment 26 Zac Medico gentoo-dev 2018-07-11 05:05:49 UTC
The dblink._add_preserve_libs_to_contents and vardbapi.removeFromContents methods are used to modify CONTENTS for preserve-libs handling, and the CONTENTS_* files will have to be updated here as well.
Comment 27 Sam 2018-08-08 12:56:24 UTC
(In reply to Zac Medico from comment #26)
> The dblink._add_preserve_libs_to_contents and vardbapi.removeFromContents
> methods are used to modify CONTENTS for preserve-libs handling, and the
> CONTENTS_* files will have to be updated here as well.

Will look into that.
Comment 28 Sam 2018-08-08 13:06:45 UTC
Created attachment 542774 [details, diff]
Output digests, modes, attr_pax, attr_caps

Added code to also record capabilities.

AFAIU, capabilities are stored in extended attributes as a 64-bit bitfield. First I simply xattr.getted them and converted to string. Although it seemed to consistently result in 20-char strings, it did so with non-regular characters. That looks crappy when you open the CONTENTS_XATTR_CAPS file. Therefore, I now grab the bitfield out of the struct and record the capabilities as a 16-bit hex string (without the 0x-prefix).

To me, that seemed as the more human-readable approach. Not sure what you think of it, though.
Happy to receive comments.
Comment 29 Sam 2018-08-16 20:21:17 UTC
Created attachment 543736 [details, diff]
Update CONTENTS_* files in _add_preserve_libs_to_contents() and removefromcontents()

I think this should more or less work, but needs testing.

Also, I still need to look at the fcaps eclass.
Comment 30 Sam 2018-08-23 22:00:36 UTC
Created attachment 544732 [details, diff]
Allow fcaps.eclass to set caps in src_install (besides pkg_postinst)

Here's a patch for fcaps.eclass.

This patch copies the wrapper for fcaps specifically for src_install. Also, if a call to fcaps from pkg_postinst is done and the caps weren't already set, a warning is triggered. This should provide feedback on cases of caps earlier set from within src_install being destroyed during moving/copying.
Comment 31 Sam 2018-08-26 20:55:04 UTC
(In reply to Sam from comment #29)
> Created attachment 543736 [details, diff] [details, diff]
> Update CONTENTS_* files in _add_preserve_libs_to_contents() and
> removefromcontents()
> 
> I think this should more or less work, but needs testing.
> 
> Also, I still need to look at the fcaps eclass.

Did some tests and found an issue with how I wanted to use self to find the directory of vdb. Fixed it by passing it more explicitly.
Comment 32 Sam 2018-08-26 20:56:00 UTC
Created attachment 545162 [details, diff]
Pass dir of CONTENTS_* files to removeFromContentsMeta()

To fix issue with incorrect use of self
Comment 33 Sam 2018-09-02 20:14:05 UTC
Created attachment 545784 [details, diff]
In fcaps.eclass, change order: first check if pkg_postinst, then check if caps were set
Comment 34 Sam 2018-10-14 08:25:08 UTC
Created attachment 550998 [details, diff]
Change fcaps.eclass: check for FILECAPS and don't provide custom src_install()

Bumped into two issues with this eclass when emerging pam:
1) The pam ebuild calls fcaps directly in pkg_postinst and doesn't provide a custom src_install(). So the fcaps variant of src_install() was used which doesn't do what it should.
To fix this, fcaps.eclass now defines fcaps_parse() and no longer defines fcaps_src_install(). Ebuild should call fcaps_parse() in their src_install() or pkg_postinst(). Fcaps.eclass still provides fcaps_pkg_postinst() which wraps around fcaps_parse()
2) The pam ebuild doesn't specify $FILECAPS() so fcaps() was borking due to too few args. This was caused by the fcaps_parse() trying to get args from $FILECAPS() and passing them on to fcaps().
To fix this, fcaps_parse() now checks whether $FILECAPS([0]) is set/non-empty. If not, it will skip trying to fire off fcaps() with empty args.
Comment 35 Sam 2018-10-15 20:34:48 UTC
Created attachment 551410 [details, diff]
Fix mistake in writeMetaData() definition

Took awhile before portage updating ran into a package where preserve-libs was needed. Until now, and it triggered a minor bug, where the function definition of writeMetaData didn't include the receiving of self. Fixed in this new patch.

I think that now all code has been touched by emerge.

Do you guys want more formal testing before inclusion?
Comment 36 Fabian Groffen gentoo-dev 2018-11-21 07:45:03 UTC
@Zac: do you think this is ready to be posted/reviewed on gentoo-portage-dev@lists.gentoo.org?
Comment 37 Zac Medico gentoo-dev 2018-11-22 02:39:07 UTC
For removeFromContents, I'd like it to be possible to rollback interrupted transactions. A procedure like this will allow for easy rollback:

1) Create marker file to indicate that a transaction is in progress. 

2) Create replacement metadata files, each with a temporary .new suffix.

3) Create hardlinks of the old metadata files, each with a .old suffix.

4) Rename each of the .new files, replacing the old versions.

5) Use SyncfsProcess to sync new files to disk.

6) Removed files with .old suffix.

7) Remove transaction marker file.

If we detect an interrupted transaction (unexpected transaction marker file), we can rollback simply by renaming all of the .old files back.
Comment 38 Zac Medico gentoo-dev 2018-11-22 03:13:41 UTC
Rather than use .new and .old file suffixes for files in in the transaction, it's probably better to group the new files and old files into separate directories. We could even use a directory as a transaction marker, for example new files could go in transaction/new/ and old files in transaction/old/.
Comment 39 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-11-25 14:32:45 UTC
I'd like to throw mtree(5) format into the list of possibility.  On the plus side, it's somewhat BSD-standard and it's used by Arch Linux's package format.  On the minus side, it doesn't seem to allow for more custom hashes.

As for hashes themselves, there's also the possibility of reusing Manifest format.

Finally, we might use both formats for two different purposes.
Comment 40 Sam 2018-11-27 08:49:39 UTC
I suspect both mtree and Manifest would be slower. The proposed format has fixed length fields, making look-ups pretty fast.
Comment 41 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-11-27 11:45:34 UTC
This assumption sounds like premature optimization. Don't sacrifice transparency or readability for negligible speed gain.