Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 79288 - expose internal dependency tree for external tools
Summary: expose internal dependency tree for external tools
Status: RESOLVED DUPLICATE of bug 136932
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-23 22:46 UTC by Richard Benjamin Voigt
Modified: 2008-03-02 16:23 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Benjamin Voigt 2005-01-23 22:46:43 UTC
Most of the goals on the portage development roadmap (http://www.gentoo.org/proj/en/portage/index.xml) could be better solved by external tools (add to gentoolkit, or portage-massive), except for one problem -- dependency calculation.

For example, the general problem of build optimization requires allocating resources: network, disk, CPU, RAM, portage-database-locks.  Portage exposes the functionality to control these resources on a given ebuild, through:
($BLAHPATH = `equery which $BLAH`)

ebuild $BLAHPATH fetch -- demands network
ebuild $BLAHPATH unpack -- demands disk
ebuild $BLAHPATH compile -- demands CPU, RAM, and disk unless using tmpfs
ebuild $BLAHPATH package -- demands disk
emerge -k $BLAH -- demands disk, exclusive lock on databases


So an external program can easily manage the distribution of work, except for one problem.  It can't know in what order to build things.  Many utils have independently developed ways of finding the dependency graph, but they will break whenever portage changes.  Portage already calculates all this information, so please expose it for others' use.

I propose, a new option "--depgraph" (implies pretend) which dumps the following to stdout:
(1) A list of all ebuilds required to be updated, i.e. the output from emerge -pv BLAH.  These are vertices in the dependency graph.  Each one is given a unique ID (preferably numeric).
(2) A list of directed edges among the ebuilds showing dependencies.  Here is my suggestion, but use something closer to portage's internal structure if this is inconvenient:
(EBUILDID, DEPENDENCYID)
or
(EBUILDID, DEP1ID, DEP2ID, DEP3ID, ...)

Example output:
emerge -e --depgraph gcc-config

These are the packages that I would merge, in reverse order:

Calculating dependencies ...done!
* VERTICES
@1: gcc-config-v.v.v
@2: portage-v.v.v
@3: python-v.v.v
@4: expat-v.v.v
@5: debianutils-v.v.v
@6: bzip2-v.v.v
etc, etc, etc
@10: glibc-v.v.v
* EDGES-LONG
(1, 2)
(1, 10)
(2, 3)
(2, 5)
(2, 10)
(3, 4)
(3, 10)
(4, 10)
(5, 6)
(5, 10)
(6, 10)
etc., etc., etc.
* EDGES-CONDENSED
(1, 2, 10, etc., etc.)
(2, 3, 5, 10, etc)
(3, 4, 10, etc)
(4, 10, etc)
(5, 6, etc)
etc., etc., etc.
* DEPENDENCY GRAPH COMPLETE

Now, any other program can rebuild the graph, add its own heuristics (example: compile glibc before gcc and then glibc again), optimize resource allocation, and call ebuild and emerge -Ok to do the work.

EXTRA DESIREMENTS:

Include dependencies outward from system or world if specified on the command-line (assign it ID 0 perhaps).  This helps people writing pruneworld scripts.

NOTES:

I'm guessing that all this information is already collected inside portage, and so all that is needed for implementation is checking an additional command-line parameter, and the printing code for the output itself.  This will stop other people from writing mangled versions of dependency-calculation, and stop portage from growing into a huge beast because of features some people would never use.
Benefits to developers: division of labor
Benefits to users: more developers, better GUIs (like porthole and portagemaster), clustering sooner, and a slim ultra-reliable portage to fix their system when the GUI tools go bad.

My apologies to whoever has to read this, I'm sure you will come up with better ways to accomplish this but I wanted to share the idea.

Reproducible: Always
Steps to Reproduce:
Run emerge -tp (for example emerge -tep world)
Actual Results:  
Dumps the items in reverse order of merge.  Each package is listed below the
last package requiring it.  This format is not easily machine parseable, and
doesn't provide any information regarding whether multiple ebuilds require a
single dependency.

Expected Results:  
Provide an option to export the dependency graph in machine parseable format.
A graph requires the following:
List of vertices
List of directed edges
Comment 1 Jason Stubbs (RETIRED) gentoo-dev 2005-07-28 07:25:31 UTC
Putting a hold on feature requests for portage as they are drowning out the 
bugs. Most of these features should be available in the next major version of 
portage. But for the time being, they are just drowning out the major bugs and 
delaying the next version's progress. 
 
Any bugs that contain patches and any bugs for etc-update or dispatch-conf can 
be reopened. Sorry, I'm just not good enough with bugzilla. ;) 
Comment 2 Richard Benjamin Voigt 2006-10-04 19:39:06 UTC
I see that equery depgraph now exists, but it is, as described, a dependency tree.

There is still no information about multiple packages depending on a single earlier one.  The dependency graph isn't a tree when the same package appears multiple times, and I know portage/equery does something to handle this to make sure all >= and <= requirements of all packages are simultaneously met.

Let me try to motivate the requirement:
Say the gcc team (which I am not part of) wanted to make sure that a particular set of core packages (toolchain, apache, apache2, tcl, ruby, xorg, kde, gnome, for example) built correctly with a proposed patch and a particular set of USE flags, and had a large group of Gentoo systems (perhaps chrooted) at their disposal to run this test.  The best way to do this test would be to first issue one system package to each host to rebuild and make binary packages that are then merged to all hosts.  Then each host is using an identical toolchain of the version needing to be tested.  Now the entire set of test packages needs to be built, probably including rebuilding the toolchain using the new compiler version.  It then becomes important to identify all the packages that depend on say libXt, in order to schedule them correctly.

This would be much more powerful than distcc because (1) parallelism considering independent packages is much higher than parallelism in a single package and (2) configure scripts would be run in parallel as well.
Comment 3 Zac Medico gentoo-dev 2006-10-04 20:01:07 UTC
(In reply to comment #2)
> There is still no information about multiple packages depending on a single
> earlier one.  The dependency graph isn't a tree when the same package appears
> multiple times, and I know portage/equery does something to handle this to make
> sure all >= and <= requirements of all packages are simultaneously met.

All that I know about equery's depgraph is that it doesn't work correctly.  Until very recently, portage (emerge) has always built an incomplete depgraph.  The currently latest version of portage (2.1.2_pre2-r3) is the first release that's ever been able to build a complete depgraph and handle circular RDEPEND correctly.

If you're interested in analysis of dependencies in the portage tree, you may also want to look at two competing projects: paludis and pkgcore.  As far as I know, both of those expose everything that you will need as libraries (paludis is C++ and pkcore is python).
Comment 4 Paul Varner (RETIRED) gentoo-dev 2006-10-05 10:37:41 UTC
equery's depends and depgraph functions are broken. For users just looking for more accurate dependency information, I am recommending app-portage/udept

I do plan on completely rewriting the depends and depgraph functionality, but work is interfering at the moment with tackling that size of a change.

In the meantime, if anyone wants to hack/work on the gentoolkit modules, the source is available at http://viewcvs.gentoo.org/viewcvs.py/gentoolkit/trunk/src/gentoolkit/

I will gratefully review any new bugs from people hacking on those modules.
Comment 5 Jason Stubbs (RETIRED) gentoo-dev 2006-10-06 10:00:50 UTC
emerge's depgraph code would have to be nearing a stage where it can be split out into portage_dep or the like, no?
Comment 6 Zac Medico gentoo-dev 2006-10-06 13:53:00 UTC
(In reply to comment #5)
> emerge's depgraph code would have to be nearing a stage where it can be split
> out into portage_dep or the like, no?

When we expose the depgraph in the api, we'll be stuck with maintaining backward compatibility.  We may want to look into solving things like bug 1343 and bug 141118 before we expose it.
Comment 7 Marius Mauch (RETIRED) gentoo-dev 2008-03-02 16:23:47 UTC

*** This bug has been marked as a duplicate of bug 136932 ***