Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 705606 - net-proxy/http-replicator-4.0_alpha2-r7 - add support for python3
Summary: net-proxy/http-replicator-4.0_alpha2-r7 - add support for python3
Status: RESOLVED WONTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Deadline: 2020-03-18
Assignee: No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it
URL:
Whiteboard:
Keywords: PMASKED
Depends on:
Blocks:
 
Reported: 2020-01-16 22:13 UTC by Einstok Fair
Modified: 2020-08-02 08:45 UTC (History)
8 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Preliminary ebuild for xedakini's version (http-replicator-4.0_alpha4.ebuild,3.79 KB, text/plain)
2020-08-01 18:34 UTC, Matthew Ogilvie
Details
setsid patch for above ebuild (setsid-4.0_alpha4.patch,872 bytes, patch)
2020-08-01 18:35 UTC, Matthew Ogilvie
Details | Diff
some sort of systemd conf file for above ebuild (http-replicator.service.conf,164 bytes, text/plain)
2020-08-01 18:35 UTC, Matthew Ogilvie
Details
repcacheman wrapper shell script for above ebuild (http-replicator-3.0-callrepcacheman-0.1,86 bytes, application/x-shellscript)
2020-08-01 18:37 UTC, Matthew Ogilvie
Details
openrc init script for above ebuild (http-replicator-4.0_alpha2-r3.init,644 bytes, text/plain)
2020-08-01 18:38 UTC, Matthew Ogilvie
Details
openrc conf script for above ebuild (http-replicator-4.0_alpha2-r2.conf,1.41 KB, text/plain)
2020-08-01 18:39 UTC, Matthew Ogilvie
Details
fetcher script documentation for above ebuild (fetcher,1.97 KB, application/x-shellscript)
2020-08-01 18:43 UTC, Matthew Ogilvie
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Einstok Fair 2020-01-16 22:13:00 UTC
[ebuild  N     ] net-proxy/http-replicator-4.0_alpha2-r7::gentoo  PYTHON_TARGETS="python2_7" 27 KiB

python 2.7 is not supported anymore, end of life was 2020-01-01, see here:
https://devguide.python.org/#status-of-python-branches
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2020-02-13 09:31:43 UTC
Last Update: 2013-12-29

I wouldn't hold my hopes high.
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2020-02-13 09:32:13 UTC
...and last release in 2008.
Comment 3 Larry the Git Cow gentoo-dev 2020-02-17 08:45:06 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=70f507bf53a1cfecc98a23f94e81173a3d964c60

commit 70f507bf53a1cfecc98a23f94e81173a3d964c60
Author:     Michał Górny <mgorny@gentoo.org>
AuthorDate: 2020-02-17 08:37:30 +0000
Commit:     Michał Górny <mgorny@gentoo.org>
CommitDate: 2020-02-17 08:37:30 +0000

    package.mask: Last rite net-proxy/http-replicator
    
    Bug: https://bugs.gentoo.org/705606
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

 profiles/package.mask | 6 ++++++
 1 file changed, 6 insertions(+)
Comment 4 PetaMem R&D 2020-02-27 17:51:30 UTC
I'd consider it quite a good practice if a replacement was mentioned before removal.

http-replicator-3.0-r8 serves us well here in an internal network (on a Gentoo-VM acting as local mirror for other Gentoo VMs) and I think it has spared the official Gentoo mirrors quite some traffic.

It's understandable if unmaintained old cruft gets removed, but if there is no equivalent replacement (or such isn't mentioned), old cruft is still better than nothing.
Comment 5 Matthew Ogilvie 2020-02-28 05:54:16 UTC
FYI: Links to more information:

I updated a months-old post about several aspects of http-replicator:
https://forums.gentoo.org/viewtopic-t-1101822-highlight-httpreplicator.html
(It now describes how I suspect that repcacheman is likely to stop
working before http-replicator itself, in addition to a lot of other
pre-existing information.)

There is a current discussion about alternatives in:
https://forums.gentoo.org/viewtopic-t-1097768-highlight-httpreplicator.html
(But I haven't seen any ideas I like.)

The original announcement and discussion thread from when http-replicator
was first added to the tree is still somewhat relevant, although it is
very long:
https://forums.gentoo.org/viewtopic-t-173226.html
Comment 6 xedakini 2020-03-03 07:32:35 UTC
0) it appears that the metadata for net-proxy/http-replicator is out of date: it shows the homepage as being sourceforge.net, but it appears that https://github.com/gertjanvanzwieten/replicator is newer
1) that doesn't help though, as the github repository is also quite old (most recent commit in 2013)


I've created a fork at https://github.com/xedakini/replicator .  I will attempt to migrate the code to Python 3, and see how many Gentoo bugs for http-replicator I can address.  I will also plan on converting repcacheman to Python 3.

I'm not sure how long-term I can commit to maintaining this fork of http-replicator, but if my plan is successful then perhaps it can at least have a bit of reprieve.

In order to give me some time for coding and testing, could I please get a bump of, say, an additional 60 days on the Removal clock?
Comment 7 Einstok Fair 2020-03-19 07:44:07 UTC
Awesome resolution, which solve all problems.
Comment 8 xedakini 2020-04-15 22:45:30 UTC
For those who are interested, I have made the necessary changes and am now dogfooding the updated code.  Those who want to try it out can download https://github.com/xedakini/replicator/archive/4.0alpha2+python3.tar.gz
and play with it.
Comment 9 Quincy 2020-07-16 06:16:12 UTC
Sadly encountered the dysfunction of my http-replicator, but luckily found this and xedakini's updated version (thank you verrry much).
I made some crude rearrangements in the ebuild to "properly" (test-)install the mentioned tar in comment 8.
This version works - meaning it downloads and caches files as the python2 version did! I'm looking forward to new versions (there have been several commits since then), but even the old one saved my day :-)
Comment 10 Andreas Sturmlechner gentoo-dev 2020-07-20 19:10:49 UTC
(In reply to PetaMem R&D from comment #4)
> It's understandable if unmaintained old cruft gets removed, but if there is
> no equivalent replacement (or such isn't mentioned), old cruft is still
> better than nothing.

That assumes that old cruft is not causing any maintenance, but quite the contrary; at the time of last-rites when it didn't have any replacement it meant you were effectively asking Portage developers to keep maintaining an EOL py2 branch ad infinitum.

Now, the last-rites spawned some work on a fork which is nice, as it often happens, but with still zero maintainers whom exactly are you expecting to do the packaging and future maintenance?

Still, nothing is stopping anyone from picking up maintenance of this package *now*, applying for write access to GURU overlay and add it there.
Comment 11 Matthew Ogilvie 2020-08-01 18:34:03 UTC
Created attachment 652132 [details]
Preliminary ebuild for xedakini's version

I finally got around to hacking together a preliminary ebuild for
xedakini's new version [thanks for the new version!].  It seems
to basically work in a small test environment, but I noticed
some limitations, room for improvement, and notes below.
Maybe some volunteers could further refine and/or maintain it?
[For the moment I'm still primarily using a stale install of
version 3.0, without being able to use version 3's repcacheman.]

This ebuild (and other files I'll attach momentarily) is updated
from the last (removed) 4.0-alpha2 ebuild.

Notes, concerns, and weaknesses:

  - The --ip argument seems to have turned into an alias for --bind,
    can't be specified multiple times, and subtley changes so it
    no longer really provides access control for which peer IP
    addresses/subnets are allowed to connect to http-replicator.
     - Actually it isn't obvious to me how it is passed into aiohttp
       stuff at all, but testing /etc/conf.d/http-replicator and
       netstat shows it is honored for specifying the (single) bind
       address.
     - You probably want to configure your install to specify
       "--ip ::" (or maybe 0.0.0.0) for it to actually serve its
       purpose on a LAN at all.  (The default "::1" is basically
       only accessible to localhost.)  Access control (if desired)
       apparently requires messing with kernel firewall
       rules instead.  (More general, but also more difficult and
       fragile when managed elsewhere.)
  - I don't see any sign of the "--alias" option.
  - Bug 705642: Despite the above, the systemd .conf file appears
    to be designed around the old v3 interface for --ip and --alias.  Is
    that conf file used at all?
     - I don't use systemd, but I kept that part from the
       previously-removed alpha2 ebuild, although alpha2 does not
       look like it supports these options the way they are used
       by the conf file, either.
  - (FYI) I think IPv6 just works, and the --ipv6 command line option
    doesn't seem to exist any more.  Probably reasonable.
  - More generally, the whole default installed
    /etc/conf.d/http-replicator.conf file probably needs to be
    carefully combed through, including the comments.

  - I threw together the setsid patch; it seems to be necessary
    when using openrc with start-stop-daemon to start the service.
    Something like it probably ought to be submitted upstream, but I
    haven't tried to do so.

  - The changelog mentions a known race condition, but doesn't go
    into details.

  - In some of my tests, portage tried to download various ".layout.conf*"
    files through http-replicator, although my most recent tests don't
    seem to do that.
     - Maybe related to some of my experiments with
       /etc/portage/mirrors to (try to) use http-replicator more
       widely than the http_proxy variable allows.
     - I worry that this could be messed up if it tries to use a stale
       cached layout file or something...

  - (tangent for portage): It is awkward to strip off the ".__download__"
    suffix that portage appends to the FILE variable (for use with the
    X-unique-cache-name custom header when invoking FETCHCOMMAND),
    especially since every client ought to be configured to include
    the header.  (Generally you want "fetcher" or an equivalent on each
    client machine.) It would be nice if portage itself arranged to
    include this custom header by default, instead...

  - (minor) Generally, http-replicator doesn't seem to report some
    kinds of errors very well.  For example:
     - I more or less had to guess about the setsid() thing.
     - Also a wildcard with --ip thing gives no indication at all
       when starting the replicator; you have to check netstat
       and/or /var/log/http-replicoator.log
     - (minor) The last version 3.0 ebuild had a patch to point you at
       repcacheman if the cache directory doesn't exist; now 4.0-alpha4
       just errors out with a missing directory error.

  - (minor) The package URL seems to be for downloading a tarball based
    on a tag, which might not be as binary/hash-stable as an actual
    release tarball.  Particularly if github regenerates it
    whenever requested, and/or github's zlib is ever upgraded and
    generates slightly different compressed output.  A "real" release
    would be better.

  - (minor) Version number 5 instead?  It looks like many substantial
    internal design changes were made (again), which seems like it
    warrants rolling a major version number (if not renaming
    the package completely), not just an alpha patch number.

  - (minor) The aiohttp dependency pulls in 12 (!) additional packages
    with a total approaching 6 MB of downloads.

  - (fixed) Alpha2's ebuild-printed instructions mentioned all-caps
    HTTP_PROXY, but that seems to be ignored on my machine.  I seem
    to need to use lower-case "http_proxy" instead, so I adjusted
    the instructions.

I don't think anyone except maybe xedakini has really audited
version 4.0-alpha4 thoroughly at all (certainly not me), although
superficially it appears it may have fixed the most obvious
deficiencies and security concerns.  (But it also eliminated
the --ip access control capability, etc.)

Signed-off-by: Matthew Ogilvie <mmogilvi+gnto@zoho.com>

---

As an alternative, it might be possible to resurrect the
version 3.0 ebuild instead (much older and more widely audited and
tested, but requires python 2, no IPv6, etc), and just have it
install the updated repcacheman.py from version 4.0-alpha4, but:

  (a) Mixing python version requirements in one package would be
      kind of an ugly hack.
  (b) I'm not sure how to "properly" depend on multiple versions of
      python from one ebuild where different installed scripts
      require different python versions.
  (c) If https://wiki.gentoo.org/wiki/Project:Python/Implementations
      is accurate, then python 2 will be dropped completely very
      early in 2021, contrary to the 2020-02-07-python-2-7-eol news item
      quote: "We are going to continue maintaining and patching
      the interpreter for as long as it is feasible, most likely even
      after all Python 2 packages are gone from Gentoo."
Comment 12 Matthew Ogilvie 2020-08-01 18:35:16 UTC
Created attachment 652134 [details, diff]
setsid patch for above ebuild
Comment 13 Matthew Ogilvie 2020-08-01 18:35:57 UTC
Created attachment 652136 [details]
some sort of systemd conf file for above ebuild
Comment 14 Matthew Ogilvie 2020-08-01 18:37:12 UTC
Created attachment 652138 [details]
repcacheman wrapper shell script for above ebuild
Comment 15 Matthew Ogilvie 2020-08-01 18:38:23 UTC
Created attachment 652140 [details]
openrc init script for above ebuild
Comment 16 Matthew Ogilvie 2020-08-01 18:39:01 UTC
Created attachment 652142 [details]
openrc conf script for above ebuild
Comment 17 Matthew Ogilvie 2020-08-01 18:43:01 UTC
Created attachment 652144 [details]
fetcher script documentation for above ebuild

This is really intended for client machines that are configured to use an http-replicator server (from any same or different machine), but it is currently installed as "documentation" by the ebuild.  See also the readme.gentoo installed by the ebuild.
Comment 18 xedakini 2020-08-02 08:45:00 UTC
(In reply to Matthew Ogilvie from comment #11)

Quite the comment; thanks for the feedback.

I'd prefer to address bugs/misfeatures/requests about the replicator within the github system, and have this Gentoo bug focus more on the Gentoo integration (ebuilds, start-up scripts, config files), to the extent that works for you.  That said...

>   - The --ip argument seems to have turned into an alias for --bind,
>     can't be specified multiple times, and subtley changes so it
>     no longer really provides access control for which peer IP

My bad.  I added "--ip" support as a quickie "fix" at the last minute, misunderstanding what it used to do.  Reading the v3 code more carefully I can see what it should be doing, and will fix that in the next (alpha) release.

>      - You probably want to configure your install to specify
>        "--ip ::" (or maybe 0.0.0.0) for it to actually serve its
>        purpose on a LAN at all.

I chose "::1" as a "this should be a safe default without any firewall rules" setting, with the intent that the integrator and/or user would likely want to override it on the command-line.  I'm open to discussion about changing this default.

>   - I don't see any sign of the "--alias" option.

It wasn't quite a quickie, and I wasn't going to have the spare time needed to work on replicator for a while, so once I stabilized the aiohttp code I pushed out what I had so that anyone interested (and aware of the repository) could start testing.  I do plan on adding that in my next coding sprint.

>   - Bug 705642: Despite the above, the systemd .conf file appears
>     to be designed around the old v3 interface for --ip and --alias.

In the extras/ directory I offer a base-line systemd .service file.  It is pretty basic, but I offer it as a starting point for any distro which wants to use it.  Change requests are welcome.

>   - I threw together the setsid patch; it seems to be necessary
>     when using openrc with start-stop-daemon to start the service.
>     Something like it probably ought to be submitted upstream,

I'll incorporate it in the next release.

>   - The changelog mentions a known race condition, but doesn't go
>     into details.

If a reader task or a writer task completes, I was _sometimes_ (thus the assumption it is a race) seeing other tasks for the same cache entry get cancelled.  The four lines marked "hack" in replicator/Cache.py seem to (mostly?) compensate for the problem, but I think they are just papering over some other problem which I have so far failed to understand correctly.

>   - (tangent for portage): It is awkward to strip off the ".__download__"
>     suffix that portage appends to the FILE variable (for use with the
>     X-unique-cache-name custom header when invoking FETCHCOMMAND),

I've been noticing that too.  As a work-around my development tree strips any ".__download__" suffix when normalizing the cache path, but it would be better if portage did not include that suffix when setting the X-unique-cache-name header.

>   - (minor) Generally, http-replicator doesn't seem to report some
>     kinds of errors very well.

I'm not shocked.  Bug reports highlighting these are welcome.

>      - I more or less had to guess about the setsid() thing.

What would have been a more helpful message?  (I'm presuming you were confronted with an ugly error logged from Python in the logs?

>      - Also a wildcard with --ip thing gives no indication at all
>        when starting the replicator;

Well, as you pointed out, --ip is not even trying to do the right thing,
so even if it did have better error handling, it would have been diagnosing the wrong thing...

> you have to check netstat
>        and/or /var/log/http-replicoator.log

Well, it is still alpha code, and in particular has recently had major code surgery done.  Monitoring the log file for errors is going to be highly recommended for now.  (Running with at least one "-v" is also recommended, and bumping up to two (-vv) can be helpful if you think you might be encountering problems.)

>   - (minor) The package URL seems to be for downloading a tarball based
>     on a tag, which might not be as binary/hash-stable as an actual
>     release tarball.  Particularly if github regenerates it
>     whenever requested, and/or github's zlib is ever upgraded and
>     generates slightly different compressed output.  A "real" release
>     would be better.

Okay... I'll see if there's a way to do that on Github (I thought I was doing it the recommended way, but you raise good points there).

>   - (minor) Version number 5 instead?  It looks like many substantial
>     internal design changes were made (again), which seems like it
>     warrants rolling a major version number 

Hmmm... maybe.  My thinking was that "alpha is unstable and subject to major internal changes, so I'll just treat this as more of the same", but the changes made this year (including change of maintainer) probably do deserve a more obvious designation in the versioning...

>   - (minor) The aiohttp dependency pulls in 12 (!) additional packages
>     with a total approaching 6 MB of downloads.

Well, I'm not going back to ad-hoc HTTP code.  Aiohttp is well-maintained and handles a lot of corner cases which http-replicator never would pay attention to  on its own.  You're probably not going to be running the replicator on an embedded system, and 6 MB isn't much for a typical server (or laptop).

> I don't think anyone except maybe xedakini has really audited
> version 4.0-alpha4 thoroughly at all

And even I do not consider the code production-ready yet.  I _am_ using it in my day-to-day portage use, and am watching that it behaves properly, but at minimum I need to understand the race condition that I see when I disable my hack before I can start to feel confident about bringing it out of "alpha" status.



TL;DR:
  * thanks for the feedback
  * --ip (as filter, not --bind synonym) and --aliases are noted as important for next release
  * setsid() call will be wrapped in try/except block
  * I'm only responding here to issues about the replicator code itself; I am ignoring Gentoo integration issues (e.g., the .ebuild)
  * while the code is known to be suboptimal, it should mostly be usable --- please test and report issues
  * I'd prefer issues with the replicator python code be raised in github, but I will continue to monitor and respond to comments raised in this Gentoo bug