Portage currently hardcodes gemato usage, but there is at least one alternative to that reference implementation right now. I would like to configure Portage to use the (or any) alternative, benefitting from the keys and tree location arguments. Please consider making this a bit more flexible. Thanks, Fabian
The verification mechanism can be broken down into 2 distinct components that can be implemented as separate plugins: * OpenPGP Key Manager: This component imports keys into a temporary GNUPGHOME, refreshes keys (retry must be allowed, see bug 649254), uses keys to verify given signatures, and finally deletes the temporary GNUPGHOME. * Manifest Verifier: If signature verification is enabled then the Manifest Verifier uses the OpenPGP Key Manager to verify the signature of the top-level Manifest before continuing to verify the repository.
isn't that then more like 3 stages? 1. ensure key is uptodate, etc 2. validate top-level Manifest using that key 3. validate the tree currently 2 and 3 are in gemato and hashgen, with 1 considered as separate task, I'm leaning towards 1 and 2 belonging to each other more than 2 and 3. In any case, the actual verification process is the most intense, and thus where most of the differences can be observed.
If the plugin is capable, we can use it for on-demand verification of a subset of the repository. Maybe some plugins will not implement this though, so we might have separate plugin types for those that are capable of on-demand verification and those that are not.
Not sure if this is the right place to ask, but can you tell me a bit more how this works? On an average spinning disk machine with cold fs cache, one needs 10 minutes to read/verify all files being IO-bound. An SSD-based machine only takes 6 seconds to contrast this, however, spinning disks are still fairly common, I'd say, so reducing the scope of things to check would make a huge difference, iso full tree verification. Is Portage going to give a set of packages that it wants to use (thus in need of checking)? Or will it do more advanced stuff by giving a set of files (including eclasses etc.) that it wants verified? Or are you thinking of something like only verify metadata in the first place? We can take this offline, or to another place if that's better.
I wasn't thinking of on-demand verification as a substitute for verifying the entire repository after sync. I think it's best to verify the entire repository after sync, in order to detect problems as early as possible. My primary motivation for on-demand verification is that sometimes repository changes are introduced by other mechanisms than sync. Also, we don't currently have a quarantine mechanism for repositories that fail to verify, but on-demand verification could act as a substitute for that. The available gemato API that we could use for on-demand verification is the assert_directory_verifies method, which verifies a directory recursively: https://github.com/mgorny/gemato/blob/v11.2/gemato/recursiveloader.py#L567
Ok, verifying a subtree is simple, but how do we deal with for instance eclasses? Also, it should verify the Manifest chain all the way up from the signed manifest to the subdir in order to ensure it's ok.
(In reply to Fabian Groffen from comment #6) > Ok, verifying a subtree is simple, but how do we deal with for instance > eclasses? In portage we can track which directories have been verified on-demand, and we'll only verify those directories once, so the eclass directory will be verified once. That way, the eclass directory Manifest data does not have to remain in memory, and it only has to be loaded once. > Also, it should verify the Manifest chain all the way up from the > signed manifest to the subdir in order to ensure it's ok. Yes, gemato's assert_directory_verifies method always begins at the top-level Manifest and traverses downward from there. I've just straced gemato and I see that it only opens the Manifest files, so if any of the non-Manifest files are corrupt that it won't be detected at sync time. For on-demand verification, we would need a parameter for the assert_directory_verifies method that causes it to verify non-Manifest files too.
(In reply to Zac Medico from comment #7) > (In reply to Fabian Groffen from comment #6) > > Also, it should verify the Manifest chain all the way up from the > > signed manifest to the subdir in order to ensure it's ok. > > Yes, gemato's assert_directory_verifies method always begins at the > top-level Manifest and traverses downward from there. I've just straced > gemato and I see that it only opens the Manifest files, so if any of the > non-Manifest files are corrupt that it won't be detected at sync time. For > on-demand verification, we would need a parameter for the > assert_directory_verifies method that causes it to verify non-Manifest files > too. Hmmm. Just to be clear on this: "so if any of the non-Manifest files are corrupt that it won't be detected at sync time." Is this a typo, or did you mean something else here. I assume gemato does a full tree verification of everything after sync, am I right? This is what hashgen is doing at least. If the tree gets corrupted after sync/verification then this would go unnoticed if this were into non-Manifest files that Portage wouldn't verify on its own, right? Or is the idea to move the entire integrity/verification process out of Portage? In any case Portage will have to do DIST files verification, unless it makes sure the location of these are known, of course. so can we assume something like Portage will request something like verify root/eclass root/cat1/pkga root/cat2/pkgb Then verification works through Manifests for root, then [eclass, cat1, cat2], and performs a full check for the entire subtree of [pkga, pkgb].
(In reply to Fabian Groffen from comment #8) > (In reply to Zac Medico from comment #7) > > (In reply to Fabian Groffen from comment #6) > > > Also, it should verify the Manifest chain all the way up from the > > > signed manifest to the subdir in order to ensure it's ok. > > > > Yes, gemato's assert_directory_verifies method always begins at the > > top-level Manifest and traverses downward from there. I've just straced > > gemato and I see that it only opens the Manifest files, so if any of the > > non-Manifest files are corrupt that it won't be detected at sync time. For > > on-demand verification, we would need a parameter for the > > assert_directory_verifies method that causes it to verify non-Manifest files > > too. > > Hmmm. Just to be clear on this: > > "so if any of the non-Manifest files are corrupt that it won't be detected > at sync time." > > Is this a typo, or did you mean something else here. I assume gemato does a > full tree verification of everything after sync, am I right? It actually does check all of the files, but I didn't see that because I forgot to use the strace -f option. > This is what hashgen is doing at least. If the tree gets corrupted after > sync/verification then this would go unnoticed if this were into > non-Manifest files that Portage wouldn't verify on its own, right? Or is > the idea to move the entire integrity/verification process out of Portage? Possibly, yes. > In any case Portage will have to do DIST files verification, unless it makes > sure the location of these are known, of course. If the verifier API provides a way access to the DIST digests, then Portage can verify DIST files without having to read/verify Manifest files directly. > so can we assume something like Portage will request something like verify > root/eclass > root/cat1/pkga > root/cat2/pkgb > > Then verification works through Manifests for root, then [eclass, cat1, > cat2], and performs a full check for the entire subtree of [pkga, pkgb]. Yes, and also metadata/md5-cache/{cat1,cat2}.