931689 – distutils-r1.eclass: Optimize pure Python and stable ABI wheel reuse

Bug 931689 - distutils-r1.eclass: Optimize pure Python and stable ABI wheel reuse

Summary: distutils-r1.eclass: Optimize pure Python and stable ABI wheel reuse

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Eclasses (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Python Gentoo Team

URL:
Whiteboard:
Keywords:	PullRequest

Depends on:
Blocks:

Reported:	2024-05-10 19:06 UTC by Michał Górny
Modified:	2024-05-20 17:02 UTC (History)
CC List:	2 users (show)

See Also:	https://github.com/gentoo/gentoo/pull/36672
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michał Górny archtester

2024-05-10 19:06:54 UTC

Right now, the PEP517 build logic roughly follows the prior setup.py logic, in that it repeatedly builds and installs the package for every interpreter separately.  However, given that the vast majority of Python packages are distributed as wheels, and therefore require matching code for every implementation, we could optimize this.

The rough idea would be to, for every implementation, check wheel directory for an existing wheel that would match.

For pure Python packages, the first build would produce a 'py3/none/any' wheel that would be reused for all remaining implementations.

For packages using stable ABI, the earlier builds would produce stable ABI wheels that would either be reused if they match, or cause new wheels to be built when they don't.

For packages using regular ABI, every wheel produced wouldn't match, so every implementation would produce a new one.

The main problem here are packages that do per-impl patching.  While rare, we need to take special care.

Comment 1 Eli Schwartz gentoo-dev

2024-05-12 01:55:50 UTC

Would you also expect this to allow only installing DEPEND for the lowest matching impl, rather than requiring to install multi-impl dependencies down the entire stack?

This should be easily doable for the build backend itself, and possibly gpep517. Also requires packages to mark themselves compatible at metadata time, rather than detecting it inside src_compile based on matching wheels.

src_test would be unaffected -- we need to test all impls regardless, and that also means installing pytest or similar for all impls.

Comment 2 Michał Górny archtester

2024-05-14 05:41:26 UTC

(In reply to Eli Schwartz from comment #1)
> Would you also expect this to allow only installing DEPEND for the lowest
> matching impl, rather than requiring to install multi-impl dependencies down
> the entire stack?

Sounds like a lot of complexity for little value.

Comment 3 Michał Górny archtester

2024-05-14 05:59:39 UTC

Some more thoughts:

1. Should we allow per-impl patching in the end?  PEP517 already bans copying sources, and therefore python_prepare() [we should probably improve docs].  Perhaps it would be good enough to require people to mark wheels as "impure" when per-impl patching is done.

2. We should probably start by diffing stuff in the potential reuse cases, and reporting QA warnings when different files are installed.  Note that we'll have to exclude shebangs for installed scripts from this.  We could also verify .abi3 files using diffoscope (thanks, Sam!).

3. We will probably want to have a developer-knob to disable this optimization (i.e. initially make it opt-in, then opt-out).  It would be combined with the above diffing capability, to let us make sure we're not missing some patching.

4. There's also the question of whether we want to always be using stable ABI when upstream does.  Perhaps we ought to consider adding a USE flag to packages doing that, and patch them to use regular ABI if people prefer.  However, I'm not convinced that stable ABI is actually making things slower yet.

Comment 4 Eli Schwartz gentoo-dev

2024-05-14 06:06:25 UTC

So it's very valuable to optimize executing a backend that completes in maybe a second, maybe even a handful of seconds.

But it's not valuable to avoid rebuilding dependencies to enable new impls that aren't even needed? (The average build backend seems to have a couple dozen dependencies...)

Remember that the packages which benefit from this are the ones that don't run a compiler nor take very long.

Comment 5 Eli Schwartz gentoo-dev

2024-05-14 06:09:51 UTC

(In reply to Michał Górny from comment #3)
> 4. There's also the question of whether we want to always be using stable
> ABI when upstream does.  Perhaps we ought to consider adding a USE flag to
> packages doing that, and patch them to use regular ABI if people prefer. 
> However, I'm not convinced that stable ABI is actually making things slower
> yet.

It's not really different from users rebuilding every package with -march=native even though perfectly good binhosts exist.

But for single-impl builds it makes sense to not bother with stable ABI as you don't get any benefit of reuse. And for multi-impl builds, users will undoubtedly want the option.

No patching is needed for at least meson-python -- you can pass -Dpython.allow_limited_api=false to meson, and packages that support the limited API will nonetheless decline to use it

Comment 6 Michał Górny archtester

2024-05-14 06:24:05 UTC

(In reply to Eli Schwartz from comment #4)
> So it's very valuable to optimize executing a backend that completes in
> maybe a second, maybe even a handful of seconds.
> 
> But it's not valuable to avoid rebuilding dependencies to enable new impls
> that aren't even needed? (The average build backend seems to have a couple
> dozen dependencies...)
> 
> Remember that the packages which benefit from this are the ones that don't
> run a compiler nor take very long.

This "optimization" would only work for users who spend time adjusting USE flags on these dependencies.

Comment 7 Michał Górny archtester

2024-05-14 10:39:02 UTC

Further note: we probably need to disable the "pure Python" part of the optimization when DISTUTILS_EXT is used.  Otherwise, e.g. a PyPy wheel without C extensions would prevent the extensions from being built for other implementations.

Comment 8 Larry the Git Cow gentoo-dev

2024-05-20 17:02:37 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=314c6b009037cf21ddfb35b6c372c5dd3819e0c5

commit 314c6b009037cf21ddfb35b6c372c5dd3819e0c5
Author:     Michał Górny <mgorny@gentoo.org>
AuthorDate: 2024-05-14 12:09:24 +0000
Commit:     Michał Górny <mgorny@gentoo.org>
CommitDate: 2024-05-20 16:56:43 +0000

    distutils-r1.eclass: Support reusing prior wheels when compatible
    
    Support reusing the wheels built for earlier Python implementations
    if they are compatible with the subsequent implementations being built.
    This includes pure Python wheels in packages that do not set
    DISTUTILS_EXT, and stable ABI wheels.
    
    Closes: https://bugs.gentoo.org/931689
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

 eclass/distutils-r1.eclass | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)