Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 934804 - [Future EAPI] Allow modifying mtime of merged files
Summary: [Future EAPI] Allow modifying mtime of merged files
Status: UNCONFIRMED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: PMS/EAPI (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: PMS/EAPI
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: future-eapi 913920
  Show dependency tree
 
Reported: 2024-06-24 08:50 UTC by Timothy Kenno Handojo
Modified: 2024-08-15 12:16 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
proposed change for the pms document (0001-emerge.tex-allow-external-timestamp-source-for-file-.patch,1.13 KB, patch)
2024-06-24 15:13 UTC, Timothy Kenno Handojo
Details | Diff
proposed change for the pms document (0001-emerge.tex-allow-external-timestamp-source-for-file-.patch,1.31 KB, patch)
2024-07-02 08:29 UTC, Timothy Kenno Handojo
Details | Diff
proposeed changes for the pms document eapi 9 (0001-emerge.tex-allow-external-timestamp-source-for-file-.patch,1.58 KB, patch)
2024-07-05 16:42 UTC, Timothy Kenno Handojo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Timothy Kenno Handojo 2024-06-24 08:50:10 UTC
The PMS on MTIME-PRESERVE is a blocker on reproducible build implementation as it prevents the build from having deterministic timestamps.

Was hoping to have at least a little leeway on this requirements to facilitate gradual changes for the reproducible build.
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2024-06-24 11:22:37 UTC
You have to be more specific what "leeway" should be, exactly.  If we don't preserve timestamps, we end up breaking up-to-date checks on files.
Comment 2 Timothy Kenno Handojo 2024-06-24 14:50:02 UTC
Can you elaborate the checks that may break?

The leeway that I'd like to have is to allow a different mtime to preserve rather than the installed files themselves.

For the reproducible build, the idea is to have the timestamp from another source that is more consistent and predictable. The current implementation that I'm working on gets it from the corresponding ebuild.
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2024-06-24 15:09:44 UTC
In the extreme case, an output file stores the timestamp of input file (in its contents) and if the timestamp of input file changes, program considers the output file out-of-date and tries to regenerate it.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-06-24 15:09:50 UTC
(Isn't it covered in bug 264130?)
Comment 5 Timothy Kenno Handojo 2024-06-24 15:13:56 UTC
Created attachment 896332 [details, diff]
proposed change for the pms document
Comment 6 Timothy Kenno Handojo 2024-06-24 15:19:11 UTC
A lot of modern programs and libraries have adapted to the reproducible build specifications defined in: https://reproducible-builds.org/docs/

Perhaps we could give this a chance? Rest assured, I will not attempt in any change that is too drastic.
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2024-06-24 15:21:31 UTC
(In reply to timkenhan from comment #5)
> Created attachment 896332 [details, diff] [details, diff]
> proposed change for the pms document

That change is meaningless.  What is "the source of timestamp"?  It's not defined anywhere in the document.  Also, it directly contradicts the next paragraph.
Comment 8 Eli Schwartz gentoo-dev 2024-06-24 15:36:34 UTC
I find this entire discussion highly confusing.

The motivation is for "reproducible builds" but the reproducible builds standard says to indicate that a build is reproducible by exporting $SOURCE_DATE_EPOCH with the timestamp that should override all metadata.

That includes timestamps embedded in files, such as python bytecode. In the ideal case the software itself knows how to use $SOURCE_DATE_EPOCH to calculate things like bytecode invalidation timestamps. Python goes one step further and respects the environment variable to mean "stop embedding timestamps at all. Instead, use hash-based invalidation, which is slower than timestamps but can't be fooled".

As for touching the files in ${D}, the point of reproducible builds is not that it preserves the mtime when merged to the root filesystem. The point of reproducible builds is that a gpkg file is reproducible, which means that consolidating mtime for all files to have reproducible mtime is something that happens as part of src_install.

It's no different from adding a `find ... -exec touch ... +` at the end of src_inatall, by hand, in a mass edit / single commit to 50k ebuilds except for the part where that doesn't make sense to do by hand when there could be a section of PMS that states "the mtime of all files at the end of src_install shall be the value of $SOURCE_DATE_EPOCH" and have PMS manage it.

Packages that embed timestamps and don't have custom workarounds in the ebuild would then not be reproducible builds compliant. Maybe reproducible builds should be a FEATURES so that those ebuilds could RESTRICT it.
Comment 9 Eli Schwartz gentoo-dev 2024-06-24 15:37:51 UTC
I'm deeply opposed to reinventing reproducible builds without respecting prior art and standardization by https://reproducible-builds.org, although granted I am not an unbiased bystander. ;)
Comment 10 Timothy Kenno Handojo 2024-06-27 08:48:21 UTC
@mgorny:
My bad, this is my first time touching on the PMS. I'll do some revision.


@eli
I hear you on this, and I would like to do as little work for this as possible.

So far, what I'm seeing is that $SOURCE_DATE_EPOCH only affects the metadata of the compiled binaries in the form of file content (e.g. elf header and all that), not the file timestamp.

One of my goal is to have the binary package tar having the same hash everytime. Please tell me this is reasonable.

Worry not, I will be adding SOURCE_DATE_EPOCH into the environment. Currently, the challenge I'm facing is figuring out how to assign env var on the fly. I've tried assigning os.environ["SOURCE_DATE_EPOCH"] on the merge loop, and the value doesn't seem to be detected on the emake script. Lemme know if I'm missing something.
Comment 11 Timothy Kenno Handojo 2024-07-02 08:29:26 UTC
Created attachment 896777 [details, diff]
proposed change for the pms document

I hope this includes sufficient detail. Any feedback would be appreciated!
Comment 12 Ulrich Müller gentoo-dev 2024-07-02 09:54:04 UTC
(In reply to Timothy Kenno Handojo from comment #11)
> Created attachment 896777 [details, diff] [details, diff]
> proposed change for the pms document

As a bare minimum, this must be EAPI dependent. Generally we don't retroactively change behaviour of existing EAPIs.
Comment 13 Timothy Kenno Handojo 2024-07-02 11:19:51 UTC
How would you do it instead then?
Comment 14 Eli Schwartz gentoo-dev 2024-07-03 03:05:13 UTC
(In reply to Timothy Kenno Handojo from comment #10)
> @eli
> I hear you on this, and I would like to do as little work for this as
> possible.


Please don't say that, as I will be obligated to oppose you to the full extent of my capabilities as an interested community member interested in reproducibility. :)

Instead of trying to "do as little work as possible", we should instead do the work that needs doing, without being afraid of having to perform work.

Incremental progress is fine, as well. But incremental progress should be based on discussing and agreeing on a desirable "overall end state", and implementing those designs one by one. The very last thing we EVER want to do, is make a decision to do something because it is "little work", then discovering it was the wrong decision and having to change it. And then being unable to change it because it's baked into a released version of PMS and projects are depending on that meaning. That would mean we would end up with one version of PMS defining a broken way to do things, another version of PMS defining a fixed way to do things, and package managers having to implement both designs for compatibility purposes.

It's one thing to recant a design decision because in hindsight it was a bad idea. It's another thing entirely to design for obsolescence, when you know starting off that it's a bad idea.


> So far, what I'm seeing is that $SOURCE_DATE_EPOCH only affects the metadata
> of the compiled binaries in the form of file content (e.g. elf header and
> all that), not the file timestamp.


No?

The point of the variable is:

- it represents a common standard people can agree on

- it indicates that software should "endeavor to be reproducible"

- it provides a machine-parseable, unambiguous datetime that should be used "for circumstances where reproducibility is relevant" rather than using the current date.


> One of my goal is to have the binary package tar having the same hash
> everytime. Please tell me this is reasonable.


It's very reasonable! It's also how I've designed package managers in the past to work -- and I did that by using $SOURCE_DATE_EPOCH to override the mtime of tar format entries.

And other package managers I haven't been involved in the design of, but have watched other people implement reproducibility for, do the exact same thing!
Comment 15 Eli Schwartz gentoo-dev 2024-07-03 03:12:31 UTC
In previous package manager implementations I have seen, there are usually inferred $SOURCE_DATE_EPOCH, if $SOURCE_DATE_EPOCH is not already set in the environment.

This inference could be based on current time, or based on the package recipe (for example, debian defaults an unset epoch value, to the most recent changelog entry in debian/changelog, not a file timestamp which can be affected by things like git cloning a package recipe and having it set to the time of the clone).


Another interesting topic is whether to use `tar --clamp-mtime`, which causes files installed with a very old mtime due to preserving the mtime from a source tarball, to be older than $SOURCE_DATE_EPOCH -- and then kept.

Debian rationale: any non-reproducible mtime caused by compiling files, creating new files, installing them to ${D} or similar, will all be the same mtime as the build process itself, and the time when you built the package is always newer than the SOURCE_DATE_EPOCH itself.
Comment 16 Timothy Kenno Handojo 2024-07-05 16:42:30 UTC
Created attachment 897081 [details, diff]
proposeed changes for the pms document eapi 9

I hope I'm doing this correctly now. Thank you everyone who's pointing me to the right direction!