Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 865905 - GPKG packages (created with FEATURES=binpkg-multi-instance?) have partial+wrong top directory
Summary: GPKG packages (created with FEATURES=binpkg-multi-instance?) have partial+wro...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Binary packages support (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 866087
  Show dependency tree
 
Reported: 2022-08-21 06:53 UTC by Michał Górny
Modified: 2022-09-09 10:56 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-08-21 06:53:44 UTC
$ tar -tf awscli-1.25.56-1.gpkg.tar 
gpkg-1
app-admin/awscli-1.25.56/metadata.tar.bz2
app-admin/awscli-1.25.56/image.tar.bz2
Manifest


Per the spec, it should actually be:

awscli-1.25.56-1/gpkg-1
awscli-1.25.56-1/metadata.tar.bz2
awscli-1.25.56-1/image.tar.bz2
awscli-1.25.56-1/Manifest
Comment 1 Sheng Yu 2022-08-21 18:32:41 UTC
It's actually designed to be like this.

gpkg-1 is the first file without anything as the magic number, so the gpkg file should always begin with b"gpkg" bytes.

Manifest is the package independent verification data, and is not bound to the package. It will be first checked before all other files, so it had to be in a constant place. Notice the package CPV could be unknown (like rebuild Index), so it cannot be put in somewhere like awscli-1.25.56-1/Manifest.
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-08-21 18:44:50 UTC
(In reply to Sheng Yu from comment #1)
> It's actually designed to be like this.
> 
> gpkg-1 is the first file without anything as the magic number, so the gpkg
> file should always begin with b"gpkg" bytes.
> 
> Manifest is the package independent verification data, and is not bound to
> the package. It will be first checked before all other files, so it had to
> be in a constant place. Notice the package CPV could be unknown (like
> rebuild Index), so it cannot be put in somewhere like
> awscli-1.25.56-1/Manifest.

In that case, you have submitted the wrong text to the GLEP.  Furthermore, your idea violates the original principles of the GPKG format (that everything ends up in a single predictable directory), so what you're suggesting wouldn't get my approval anyway.
Comment 3 Sheng Yu 2022-08-21 21:24:18 UTC
(In reply to Michał Górny from comment #2)
> (In reply to Sheng Yu from comment #1)
> > It's actually designed to be like this.
> > 
> > gpkg-1 is the first file without anything as the magic number, so the gpkg
> > file should always begin with b"gpkg" bytes.
> > 
> > Manifest is the package independent verification data, and is not bound to
> > the package. It will be first checked before all other files, so it had to
> > be in a constant place. Notice the package CPV could be unknown (like
> > rebuild Index), so it cannot be put in somewhere like
> > awscli-1.25.56-1/Manifest.
> 
> In that case, you have submitted the wrong text to the GLEP.  Furthermore,
> your idea violates the original principles of the GPKG format (that
> everything ends up in a single predictable directory), so what you're
> suggesting wouldn't get my approval anyway.


The request of identifier and magic number ``gpkg-1`` exists before I modified draft. So it not possible in a single  directory for the beginning. Someone (not remember who) explicitly tells me that gpkg-1 should be the first file, so it can be detected with tools like `file`.

https://gitweb.gentoo.org/data/glep.git/tree/glep-0078.rst?id=e22589cdba6b1f4f8b0eec072e2661b329520e8a

For the basename, I was thought it was refer to the CPV. However, if you want to it match the package file name, that is  problematic. For example:

1. Portage will first try create awscli-1.25.56.gpkg.tar.XXXXXX (or xpak), no build_id exists.
So you will get

awscli-1.25.56/gpkg-1
awscli-1.25.56/metadata.tar.bz2
awscli-1.25.56/image.tar.bz2
awscli-1.25.56/Manifest

2. This file will be renamed to awscli-1.25.56-1.gpkg.tar, but still no build_id in metadata.
3. Then update metadata will be called and add build_id to it. To match the filename, it has to rename all related old names to new names, as well as update the Manifest.
4. Update the Index file.

And update metadata can be called multiple times for different reasons. Each time it has to check if needs to rename something which may or may not has build_id or the package is renamed to something else.

The other problem is filename can be duplicate in different category, so without CPV, you can overwrite others when manually extract.
The most obviously is acct-user / acct-group which often has the same version.

There is also kind of security risk to put key files under variable paths.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-08-22 07:04:05 UTC
(In reply to Sheng Yu from comment #3)
> The request of identifier and magic number ``gpkg-1`` exists before I
> modified draft. So it not possible in a single  directory for the beginning.
> Someone (not remember who) explicitly tells me that gpkg-1 should be the
> first file, so it can be detected with tools like `file`.
> 
> https://gitweb.gentoo.org/data/glep.git/tree/glep-0078.
> rst?id=e22589cdba6b1f4f8b0eec072e2661b329520e8a

The directory is not a problem for that, at least as long as it doesn't exceed USTAR's name field.  I don't think we really need to account for the latter.  It's just a matter of matching `.*/gpkg-1` regex in the first 100 bytes.

> For the basename, I was thought it was refer to the CPV. However, if you
> want to it match the package file name, that is  problematic. For example:
> 
> 1. Portage will first try create awscli-1.25.56.gpkg.tar.XXXXXX (or xpak),
> no build_id exists.
> So you will get
> 
> awscli-1.25.56/gpkg-1
> awscli-1.25.56/metadata.tar.bz2
> awscli-1.25.56/image.tar.bz2
> awscli-1.25.56/Manifest
> 
> 2. This file will be renamed to awscli-1.25.56-1.gpkg.tar, but still no
> build_id in metadata.
> 3. Then update metadata will be called and add build_id to it. To match the
> filename, it has to rename all related old names to new names, as well as
> update the Manifest.
> 4. Update the Index file.
> 
> And update metadata can be called multiple times for different reasons. Each
> time it has to check if needs to rename something which may or may not has
> build_id or the package is renamed to something else.

Sounds like we need to change the generation process then.  I don't really see a reason why Portage couldn't decide on build-id earlier and reserve the filename.

> The other problem is filename can be duplicate in different category, so
> without CPV, you can overwrite others when manually extract.
> The most obviously is acct-user / acct-group which often has the same
> version.

That's not really a problem.

> There is also kind of security risk to put key files under variable paths.

Could you explain?
Comment 5 Fabian Groffen gentoo-dev 2022-08-22 08:06:57 UTC
(In reply to Michał Górny from comment #4)
> The directory is not a problem for that, at least as long as it doesn't
> exceed USTAR's name field.  I don't think we really need to account for the
> latter.  It's just a matter of matching `.*/gpkg-1` regex in the first 100
> bytes.

What is the problem with actually defining gpkg-1 to be the first entry, named like that?  It means it would always succeed, and make it much easier to identify a file as gpkg format.

> > For the basename, I was thought it was refer to the CPV. However, if you
> > want to it match the package file name, that is  problematic. For example:

`The archive contains a number of files, stored in a single directory
whose name should match the basename of the package file.`

Then the "package file", seems to be defined in

`The gpkg package container is an uncompressed .tar achive whose filename
should use ``.gpkg.tar`` suffix.`

So, basename should be something like foo-1.3-1.gpkg.tar?  Feels like an amendment to the GLEP is needed to specify which suffixes are to be removed.
Comment 6 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-08-22 08:50:57 UTC
(In reply to Fabian Groffen from comment #5)
> (In reply to Michał Górny from comment #4)
> > The directory is not a problem for that, at least as long as it doesn't
> > exceed USTAR's name field.  I don't think we really need to account for the
> > latter.  It's just a matter of matching `.*/gpkg-1` regex in the first 100
> > bytes.
> 
> What is the problem with actually defining gpkg-1 to be the first entry,
> named like that?  It means it would always succeed, and make it much easier
> to identify a file as gpkg format.

The problem is that extracting such archive will create a top level 'gpkg-1' file which is against the principle of least surprise.

> > > For the basename, I was thought it was refer to the CPV. However, if you
> > > want to it match the package file name, that is  problematic. For example:
> 
> `The archive contains a number of files, stored in a single directory
> whose name should match the basename of the package file.`
> 
> Then the "package file", seems to be defined in
> 
> `The gpkg package container is an uncompressed .tar achive whose filename
> should use ``.gpkg.tar`` suffix.`
> 
> So, basename should be something like foo-1.3-1.gpkg.tar?  Feels like an
> amendment to the GLEP is needed to specify which suffixes are to be removed.

The intent was to strip both suffixes.
Comment 7 Sheng Yu 2022-08-23 07:09:09 UTC
(In reply to Michał Górny from comment #4)
> (In reply to Sheng Yu from comment #3)
> > The request of identifier and magic number ``gpkg-1`` exists before I
> > modified draft. So it not possible in a single  directory for the beginning.
> > Someone (not remember who) explicitly tells me that gpkg-1 should be the
> > first file, so it can be detected with tools like `file`.
> > 
> > https://gitweb.gentoo.org/data/glep.git/tree/glep-0078.
> > rst?id=e22589cdba6b1f4f8b0eec072e2661b329520e8a
> 
> The directory is not a problem for that, at least as long as it doesn't
> exceed USTAR's name field.  I don't think we really need to account for the
> latter.  It's just a matter of matching `.*/gpkg-1` regex in the first 100
> bytes.
> 
> > For the basename, I was thought it was refer to the CPV. However, if you
> > want to it match the package file name, that is  problematic. For example:
> > 
> > 1. Portage will first try create awscli-1.25.56.gpkg.tar.XXXXXX (or xpak),
> > no build_id exists.
> > So you will get
> > 
> > awscli-1.25.56/gpkg-1
> > awscli-1.25.56/metadata.tar.bz2
> > awscli-1.25.56/image.tar.bz2
> > awscli-1.25.56/Manifest
> > 
> > 2. This file will be renamed to awscli-1.25.56-1.gpkg.tar, but still no
> > build_id in metadata.
> > 3. Then update metadata will be called and add build_id to it. To match the
> > filename, it has to rename all related old names to new names, as well as
> > update the Manifest.
> > 4. Update the Index file.
> > 
> > And update metadata can be called multiple times for different reasons. Each
> > time it has to check if needs to rename something which may or may not has
> > build_id or the package is renamed to something else.
> 
> Sounds like we need to change the generation process then.  I don't really
> see a reason why Portage couldn't decide on build-id earlier and reserve the
> filename.

Well, that is doable, but need to change a lot of logic.
 
> > The other problem is filename can be duplicate in different category, so
> > without CPV, you can overwrite others when manually extract.
> > The most obviously is acct-user / acct-group which often has the same
> > version.
> 
> That's not really a problem.
> 
> > There is also kind of security risk to put key files under variable paths.
> 
> Could you explain?

By putting Manifest in the inner directory, it is kind of giving up outer level integrity. Or more intuitively, that directory name doesn't matter at all. I can check that there can only be one directory and no other files are allowed. But there is no way to check what this gpkg file was like when it was created. So you can change awscli-1.25.56-1/ to AAAAA/ and it will pass the check. If you think this is okay, I can move them all in.
Comment 8 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-08-23 11:19:24 UTC
(In reply to Sheng Yu from comment #7)
> (In reply to Michał Górny from comment #4)
> > Sounds like we need to change the generation process then.  I don't really
> > see a reason why Portage couldn't decide on build-id earlier and reserve the
> > filename.
> 
> Well, that is doable, but need to change a lot of logic.

Well, technically I don't think there's any real requirement that the build-id is sequential.  Unless I'm mistaken, the only problem we care about is being able to have unique suffix for every binpkg generated.  How about using some hash or timestamp, or even random suffix instead?  At this point, I don't think you even really need to reserve it, as long as it's "random enough", i.e. unlikely to collide with prior binpackages.  At this point, the highest build-id I have is -54, and that's for a package I've rebuilt (for testing) a lot.  Normally build-ids are unlikely to exceed 10, except for live packages -- but we don't care about multi-instance there a lot.

> By putting Manifest in the inner directory, it is kind of giving up outer
> level integrity. Or more intuitively, that directory name doesn't matter at
> all. I can check that there can only be one directory and no other files are
> allowed. But there is no way to check what this gpkg file was like when it
> was created. So you can change awscli-1.25.56-1/ to AAAAA/ and it will pass
> the check. If you think this is okay, I can move them all in.

Yes, that is fine.  Per the GLEP, we don't really care how the directory is named, and the PM should be able to handle any directory name, valid or not.  We only want to generate the correct top directory for user convenience but we must be handle to handle "malformed" packages.
Comment 9 Sheng Yu 2022-08-31 21:42:42 UTC
Fix pending review / merge.
https://github.com/gentoo/portage/pull/893
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-09-09 10:44:48 UTC
commit fa901a6510c4a5f72dec6aad86db6fe7efd6e7b3 (HEAD -> master, origin/master, origin/HEAD)
Author: Sheng Yu <syu.os@protonmail.com>
Date:   Tue Sep 6 14:38:27 2022 -0400

    Add BUILD_ID to metadata during binary packaging

    Also create placeholder for new multi-instance package before actually
    packaging to avoid race.

    Signed-off-by: Sheng Yu <syu.os@protonmail.com>
    Closes: https://github.com/gentoo/portage/pull/893
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

commit 1d94992a2df2b5cc963c26c7978a899dc642deb1
Author: Sheng Yu <syu.os@protonmail.com>
Date:   Thu Sep 1 10:44:55 2022 -0400

    Move all files into basename/DATA structure

    Signed-off-by: Sheng Yu <syu.os@protonmail.com>
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

commit 020feffb39cd6c4c68a4a81be6ccd2426b062e8d
Author: Sheng Yu <syu.os@protonmail.com>
Date:   Fri Aug 19 16:40:59 2022 -0400

    Add gpkg-sign tool to sign exist GPKG files.

    Signed-off-by: Sheng Yu <syu.os@protonmail.com>
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

commit 52411290c67535d94c7b20fa996ae7108014adfb
Author: Sheng Yu <syu.os@protonmail.com>
Date:   Fri Aug 19 16:24:59 2022 -0400

    Detect binary package format if not in database

    Signed-off-by: Sheng Yu <syu.os@protonmail.com>
    Signed-off-by: Michał Górny <mgorny@gentoo.org>
Comment 11 Larry the Git Cow gentoo-dev 2022-09-09 10:56:10 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=06bf8d495327ebb5718821a27622f1b7eb944943

commit 06bf8d495327ebb5718821a27622f1b7eb944943
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2022-09-09 10:55:57 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2022-09-09 10:55:57 +0000

    sys-apps/portage: add 3.0.36
    
    Closes: https://bugs.gentoo.org/865905
    Closes: https://bugs.gentoo.org/864160
    Closes: https://bugs.gentoo.org/866087
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-3.0.36.ebuild | 273 +++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)