Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 301915 - Python bytecode (.py[co]) sometimes included sometimes excluded in python packages
Summary: Python bytecode (.py[co]) sometimes included sometimes excluded in python pac...
Status: RESOLVED OBSOLETE
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Unspecified (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Python Gentoo Team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks:
 
Reported: 2010-01-23 13:20 UTC by Fabian Groffen
Modified: 2018-02-04 12:58 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Groffen gentoo-dev 2010-01-23 13:20:31 UTC
This bug is to notify the Python team of an inconsistency and possible structural problem with Python within Gentoo regarding Python precompiled files.  The Gentoo Prefix team ran into this as side-issue of a larger problem which is irrelevant here.

In short, it is simple:
- some packages generate .py[co] files as part of their installation ritual, resulting in Portage knowing about them and (un)installing them as recorded in the VDB.
- other packages do not precompile and hence Portage does not know about the existence of any .py[co] files.

This is inconsistent.  One of the most notable packages that generates .py[co] files itself is dev-lang/portage.

The python_mod_optimize function generates .py[co] files in the live filesystem for just installed packages, hence without Portage knowing, resulting in the similar python_mod_cleanup function to remove stray .py[co] files after unmerging an ebuild.

From where we stand, we think no single package should ever install .py[co] files, and generation of .py[co] files afterwards should be optional.
Comment 1 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2010-02-12 21:38:36 UTC
I believe this situation is improved since python.eclass is solely in gentoo-x86 now. yes?
Comment 2 Fabian Groffen gentoo-dev 2010-02-12 21:42:15 UTC
Yes, but I haven't checked what happened to it.  There was no feedback on the bug about prefix changes.
Comment 3 Arfrever Frehtes Taifersar Arahesis (RETIRED) gentoo-dev 2010-05-25 19:58:35 UTC
I think that packages shouldn't install .pyc / .pyo files.
I have fixed dev-lang/python.

If there are no objections in several days, then I will add a QA notice to Portage, which will warn about .pyc / .pyo files in ${D}.
Comment 4 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2010-05-25 20:04:07 UTC
(In reply to comment #3)
> I think that packages shouldn't install .pyc / .pyo files.
> I have fixed dev-lang/python.
> 
> If there are no objections in several days, then I will add a QA notice to
> Portage, which will warn about .pyc / .pyo files in ${D}.
> 

Thanks, I strongly agree that .pyc/pyo files should not be in ${D}.
Comment 5 Martin von Gagern 2010-06-13 10:03:59 UTC
What's the rationale not to include .py[co] files in the archive? If there is some IRC log or mailing thread elaborating this point, I'd welcome a pointer to it. Otherwise some kind of explanation here would be appreciated.

To me it seems that there are a lot of reasons why this optimization would be better suited for src_compile than pkg_postinst:
1. EAPI=3 preserves timestamps, so the main reason aginst this should be gone
2. when using binary packages, this prevents unneccesary work at install time
3. less orphans, files can be attributed to packages more easily
4. no hacks required to disable optimization in exotic build systems
5. more atomic and thus faster package merging
6. bytecode optimization feels more like complation than like installation

I'd like new python ebuilds to use EAPI=3 and deal with bytecode generation themselves, probably through an updated eclass that takes care of this. For earlier EAPIs the current process is probably best, due to timestamp issues, but why not deprecate that solution in the long run, except for the ebuild of portage itself which shouldn't rely on recent EAPIs?

By the way: optional installation of .pyc files, as suggested by comment #0, could be problematic for scripts run by root, as in those cases the python interpreter would probably generate those files on the fly, and there would be noone to clean up afterwards, would there?
Comment 6 Arfrever Frehtes Taifersar Arahesis (RETIRED) gentoo-dev 2010-08-05 19:42:48 UTC
There are the following reasons for current handling of byte-compiled Python modules:
1. The magic number can change in given Python slot. If .py[co] files installed
   by a binary package were generated by Python with a different magic number,
   than they would be useless.
2. Potentially more complete deletion of .py[co] files during uninstallation.
   Names of .py[co] files from Python >=3.2 include the magic tag. If root user
   tried to import given modules with a different Python version, then
   additional (not tracked by package manager) .py[co] files would be
   generated.
3. Binary packages are smaller.
4. Potential customization of behavior.

The check in Portage has been added:
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commitdiff;h=6403211fa4be5b0c6d47ea87c6be06b62184d33b
Comment 7 Brian Harring (RETIRED) gentoo-dev 2010-08-06 08:06:55 UTC
In looking at this change... it should be reverted.

(In reply to comment #6)
> There are the following reasons for current handling of byte-compiled Python
> modules:
> 1. The magic number can change in given Python slot. If .py[co] files installed
>    by a binary package were generated by Python with a different magic number,
>    than they would be useless.

Scan for that than please.  Data on how often the bytecode magic has changed within a slotting would be useful also.

> 2. Potentially more complete deletion of .py[co] files during uninstallation.
>    Names of .py[co] files from Python >=3.2 include the magic tag. If root user
>    tried to import given modules with a different Python version, then
>    additional (not tracked by package manager) .py[co] files would be
>    generated.

Your argument of "more complete deletion" is based upon removing the pkg manager's knowledge of the files existing and hoping the python eclass at the time of snapshot will clean things up- this is even _assuming_ they're using that leviathan, which doesn't really apply for standalone repos.

This is a bit contrary- an ownership scan for a .pyc wouldn't pick up those files ownership.  They're orphaned.

As for >=3.2 including the magic number in the filename as you implied... I do not see this behaviour at all, and it would surprise the hell out of me.  You're going to need to provide references for that.


> 3. Binary packages are smaller.
> 4. Potential customization of behavior.

5. merges are slower
6. it's impossible to apply customizations once, in the binpkg
7. since mtime is now stable on merges, this change you leveled means that even cases where things are correct it still is going to bitch that ebuild devs should move all pyc/pyo generation outside of the managers purview.

Basically, this sucks, and is not something you can unilaterally decide.  I don't hugely buy your reasoning either, and knowing the cons I don't agree the claimed pro's offset it- this needs reversion and discussion.
Comment 8 Martin von Gagern 2010-08-06 08:39:31 UTC
(In reply to comment #6)
> There are the following reasons for current handling of byte-compiled Python
> modules:

First off, thanks for that information.

> 1. The magic number can change in given Python slot. If .py[co] files
>    installed by a binary package were generated by Python with a different
>    magic number, than they would be useless.

Feels very much like the scenario of SONAMEs stored in compiled ELF binaries. If the SONAME of some dependency changes, you'll have to recompile stuff, because some "magic byte sequence" in the files doesn't fit the system any more. I guess it would be feasible to have revdep-rebuild as well as the preserved-rebuild set from portage 2.2 check for this python mismatch as well.

> 2. Potentially more complete deletion of .py[co] files during uninstallation.
>    Names of .py[co] files from Python >=3.2 include the magic tag. If root
>    user tried to import given modules with a different Python version, then
>    additional (not tracked by package manager) .py[co] files would be
>    generated.

If there are additional files that might be associated with the package but cannot be created at install time, then cleaning up those seems sensible. But that doesn't mean that this should be the sole method of cleaning up .py[co] files. You could very well have the package install & uninstall its "usual" .py[co] files the normal way, and still clean up any additional untracked files using a suitable eclass function. And perhaps in the long run even make that additional cleanup safer by ensuring that the cleaned files are in fact orphaned.

> 3. Binary packages are smaller.

Not a good reason imo. Being small isn't part of their job description. Being as binary as possible could be considered part of their description.

> 4. Potential customization of behavior.

Most of Gentoo customization is via USE flags and *FLAGS environment variables set at compile time. I see no reason python packages should be coustomized at install time. Matches 6. from Brian Harring.
Comment 9 Fabian Groffen gentoo-dev 2010-08-06 08:57:55 UTC
Since I'm just in favour of having the cache stuff generated only when required (one might not even want to have it on embedded systems, so a unified way to enable/disable it is good IMO), just a few shots from my side.

(In reply to comment #7)
> 5. merges are slower

only for pkgs that actually do byte-compiling themselves (because it's removed and redone), or installs from binpkgs

> 6. it's impossible to apply customizations once, in the binpkg

I doubt you really want to do any customizations to the bytecode-cache files.  Tampering with them feels to me like loss of warranty ;)

> Basically, this sucks, and is not something you can unilaterally decide.  I
> don't hugely buy your reasoning either, and knowing the cons I don't agree the
> claimed pro's offset it- this needs reversion and discussion.

Not targetted at Brian in particular here, but let's calm down a bit.  Read comment #0, the situation was inconsistent, so there is no big damage done here, it was only made *consistent* for all packages by "policy".  Now since either approach (in or out VDB) worked fine in the past, there is nothing going to be broken here in a way that it wasn't before.

Since we sort of "fixed" our original problem in Prefix where the py[co] files were causing troubles for us in our binpkgs, it is no real problem for us to have the py[co] files included again, like it used to.  However, before making wild commits, please do realise that suddenly enabling them to be *in* VDB makes all collision-protect users (Prefix in particular) very unhappy.  So if it needs to be changed (please keep it consistent in the first place!) after all, please discuss it with the Prefix team so we can write a news item instructing people to do some ugly -collision-protect workaround.
Comment 10 Brian Harring (RETIRED) gentoo-dev 2010-08-06 10:07:53 UTC
(In reply to comment #9)
> Since I'm just in favour of having the cache stuff generated only when required
> (one might not even want to have it on embedded systems, so a unified way to
> enable/disable it is good IMO), just a few shots from my side.

I actually agree on all accounts there.

> (In reply to comment #7)
> > 5. merges are slower
> 
> only for pkgs that actually do byte-compiling themselves (because it's removed
> and redone), or installs from binpkgs

The "removed and redone" isn't actually true- it's recompiled if python_mod_optimize is in use via the python eclass, but this check doesn't actually check for that.  It just looks for *any* .pyc/.pyo, and tells people to go use the python eclass instead.

In other words, pkgs that handle this fine get flagged as a QA violation with this check, instead told they have to orphan their cache bytecode via the python eclass.


> > 6. it's impossible to apply customizations once, in the binpkg
> 
> I doubt you really want to do any customizations to the bytecode-cache files. 
> Tampering with them feels to me like loss of warranty ;)

A claim was leveled by arfie that this change allows "Potential customization of behavior"; I'm pointing out it forces it to be ran on every merging system, rather than being able to be integrated into the binpkg itself.

Both usage scenarios have value mind you, but what's in vcs now QA nags people castrating the binpkg scenario.

As for customizations to the binpkg, a simple example is forcing prebundling of bytecode cache to avoid it being regenerated on the system- think of qmerge which skips preinst/postinst invocation (thus doesn't generate bytecode by the manager).

 
> > Basically, this sucks, and is not something you can unilaterally decide.  I
> > don't hugely buy your reasoning either, and knowing the cons I don't agree the
> > claimed pro's offset it- this needs reversion and discussion.
> 
> Not targetted at Brian in particular here, but let's calm down a bit.  Read
> comment #0, the situation was inconsistent, so there is no big damage done
> here, it was only made *consistent* for all packages by "policy".  Now since
> either approach (in or out VDB) worked fine in the past, there is nothing going
> to be broken here in a way that it wasn't before.

In the past, up until EAPI3 w/ mtime gurantees, you couldn't have bytecode in the merge- it was pointless since mtime wasn't guranteed.  Part of the reason EAPI3 added mtime was so that bytecode _could_ be merged, and to get the gains of no longer orphaning files and not being forced to recompile on every damn box.

The damage is that a tree policy decision just got forced via a check pushed into portage, rather than via actual discussion on what the behaviour *should* be.  As said, the given reasons he stated I don't particularly buy- need real data to back it.  At least one of them is directly contradicted by current py3.2 for example (filename naming).

It needs pulling, pure and simple- had problems with unilateral decisions being made without consulting others, this is the same damn thing.

> So if
> it needs to be changed (please keep it consistent in the first place!) after
> all, please discuss it with the Prefix team so we can write a news item
> instructing people to do some ugly -collision-protect workaround.

Yeah, aware of this issue actually.  This is one that does need sorting- probably via tweaking portage's collision-protect to not care about certain extensions (.pyc/.pyo for example).
Comment 11 Brian Harring (RETIRED) gentoo-dev 2010-08-06 10:45:35 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > 2. Potentially more complete deletion of .py[co] files during uninstallation.
> >    Names of .py[co] files from Python >=3.2 include the magic tag. If root user
> >    tried to import given modules with a different Python version, then
> >    additional (not tracked by package manager) .py[co] files would be
> >    generated.
> As for >=3.2 including the magic number in the filename as you implied... I do
> not see this behaviour at all, and it would surprise the hell out of me. 
> You're going to need to provide references for that.

This is PEP 3147 shifting the cache to __pycache__/, specifically hit 3.2 in _alpha1 from the looks of it.  The orphan comments still stand however.

Proper solution for that one at a glance is realistically manager awareness of it also.
Comment 12 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2010-08-06 12:38:37 UTC
At the time this bug was reported, there was three things going on.

1) Some packages installed bytecode in ${D} (eg. dev-lang/python)
2) Some packages disabled bytecode generation until runtime (eg. sys-apps/portage) [iirc. if not portage, there there *are* other examples]
3) Some packages installed bytecode in pkg_postinst (eg. most other packages)

For 3) most upstream build systems had done bytecode generations during the compile phase, but many Gentoo Developers explicitly disabled this and moved it to postinst. As stated, going from no bytecode in VDB to bytecode in VDB will cause file collisions for all users, that will be bad.

Regardless of the actual decision here, can it be made formal, a QA "rule", and documented on devmanual so this situation doesn't happen again? Mainly so that the tree is consistent (see points 1-3).
Comment 13 Brian Harring (RETIRED) gentoo-dev 2010-08-07 13:59:32 UTC
(In reply to comment #12)
> Regardless of the actual decision here, can it be made formal, a QA "rule", and
> documented on devmanual so this situation doesn't happen again? Mainly so that
> the tree is consistent (see points 1-3).

That's pretty much my complaint; this change means that it's a QA violation in portage's eyes to *ever* install .py[co], always has to be generated in a postinst.

Realistically this could use a round of discussion on the dev ml- whatever the end result, that gets codified in policy docs/devmanual, and appropriate QA check is added.  That's the proper route for making distro wide changes- thus those steps should be started if this change is desired.
Comment 14 Arfrever Frehtes Taifersar Arahesis (RETIRED) gentoo-dev 2010-08-07 16:43:58 UTC
The previous code meant that there was a policy of not installing .py[co] files into /usr/share. I think that it doesn't make sense to have different policies for .py[co] files in /usr/share and /usr/lib/python*.

I think that support for PEP 3147 is something more suitable in an eclass than in package managers. Many other eclasses generate some files in pkg_postinst() and remove them in pkg_postrm().
Comment 15 Brian Harring (RETIRED) gentoo-dev 2010-08-07 17:16:37 UTC
(In reply to comment #14)
> The previous code meant that there was a policy of not installing .py[co] files
> into /usr/share. I think that it doesn't make sense to have different policies
> for .py[co] files in /usr/share and /usr/lib/python*.

Banning .pyc in /usr/share, where multiple python versions looks, is different from /usr/lib/python{the-specific-major-fricking-version}

As such, there is a very good reason to treat the areas differently.

> I think that support for PEP 3147 is something more suitable in an eclass than
> in package managers. Many other eclasses generate some files in pkg_postinst()
> and remove them in pkg_postrm().

Eclasses are dumb however- they cannot know the specific mode of installation, if this pkg is being generated for distribution to multiple nodes (meaning cache it once), etc.  The orphan issue is well known in addition.

As for other eclasses doing something, "just because others are doing something doesn't mean it's right" comes to mind.  This is in addition ignoring that eapi's change- up until eapi3 it wasn't possible to trust mtime so eclasses had to hack around it.

Either way, again, you tried forcing a distro wide policy via backdooring a QA check.  You know the proper steps to take, take them if you want this.
Comment 16 Brian Harring (RETIRED) gentoo-dev 2010-08-07 18:40:57 UTC
@QA, devmanual's yours as far as I know, which would mean y'all actually make decisions of the sort like "gentoo-x86 ebuilds must not install .py[co]"... yours to comment.
Comment 17 Zac Medico gentoo-dev 2010-08-22 22:38:55 UTC
I've removed all the python-oriented checks for now:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2151bbe5d1699b950d72ce7d0ba4363691a19478
Comment 18 Fabian Groffen gentoo-dev 2011-12-15 18:39:21 UTC
no specific need for Prefix to be here any more
Comment 19 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-02-04 12:58:02 UTC
The current Python team policy is to always include those files.