/usr/portage/licenses/ contains files that are no longer in use. Reproducible: Always Steps to Reproduce: emerge sync find /usr/portage -name '*.ebuild' -print0 | xargs -0 awk 'BEGIN { RS="\"" } /LICENSE=/ { getline; print }' | tr ' |()' '\n' | sort -u | comm -13 - <( ls /usr/portage/licenses/ ) Actual Results: ARC CL-PDF D&R DIVX DUMB HP MONK Matrox PANDA-gGo SMAIL SNNS-4.2 XC aspell-nl bigelow-holmes-urw-gmbh-luxi blender creativecommons-by-nc-sa-1.0 icc-5.0 ioncube_loaders nokia nomachine nxcomp phpdbg realone realplayer8 realsdk root-license sun-bcla-sos vhfPL vpython Expected Results: <no output>
You've incorrectly assumed that ebuilds never inherit their LICENSE value from an eclass and that the LICENSE value is never defined across multiple lines with your one-liner. Here's a perl-style python one-liner that uses portage directly: $ python -c 'import os, portage, portage_util; all_licenses = os.listdir(portage.settings["PORTDIR"]+"/licenses"); ebuild_licenses = portage.unique_array(portage.flatten([portage.db["/"] ["porttree"].dbapi.aux_get(y, ["LICENSE"])[0].split() for y in portage.flatten([portage.db["/"]["porttree"].dbapi.xmatch("match-all",x) for x in portage.db["/"]["porttree"].dbapi.cp_all()])])); all_licenses.sort(); print "\n".join(l for l in all_licenses if l not in ebuild_licenses)' ARC BL CL-PDF D&R DIVX DUMB HP MONK PANDA-gGo SMAIL XC aspell-nl blender creativecommons-by-nc-sa-1.0 icc-5.0 nxcomp phpdbg realone realplayer8 realsdk root-license sun-bcla-sos vhfPL vpython
And the || ( ) need to be stripped out. For example, BL shouldn't be in that list, but it gets picked up because it never appears alone. Rev 3: python -c 'import os, portage, portage_util; all_licenses = os.listdir(portage.settings["PORTDIR"]+"/licenses"); all_licenses.sort(); ebuild_licenses = portage.unique_array(" ".join( [ z.replace("|", " ").replace("(", " ").replace(")", " ") for z in portage.flatten( [ portage.db["/"]["porttree"].dbapi.aux_get(y, ["LICENSE"])[0].split() for y in portage.flatten( [ portage.db["/"]["porttree"].dbapi.xmatch( "match-all",x) for x in portage.db["/"]["porttree"].dbapi.cp_all() ] ) ] ) ] ).split()); print "\n".join(l for l in all_licenses if l not in ebuild_licenses)' ARC CL-PDF D&R DIVX DUMB HP MONK PANDA-gGo SMAIL XC aspell-nl blender creativecommons-by-nc-sa-1.0 icc-5.0 nxcomp phpdbg realone realplayer8 realsdk root-license sun-bcla-sos vhfPL vpython
If any packages have "(" ")" or "||" and licenses without spaces in between that is broken as well, but probably deserves a different bug.
>>> def stripstuff ( z ): ... return z.replace("|", " ").replace("(", " ").replace(")", " ").split() ... >>> stripstuff("|| ( a b ) || ( c d )") ['a', 'b', 'c', 'd'] >>> stripstuff("||(a b)||(c d)") ['a', 'b', 'c', 'd'] I don't understand the problem.
I'm not saying that my code failed in those cases. I'm saying that it wasn't wrong to assume that those cases don't exist because they are malformed.
Ah, thank you. bug 114948 created to address malformedness.
ARC BKL CL-PDF D&R DIVX DUMB HP LPL-v1.02 MONK PANDA-gGo SISSL-1.1 SMAIL XC aspell-nl blender creativecommons-by-nc-sa-1.0 icc-5.0 nxcomp pclcomp phpdbg realone realplayer8 realsdk root-license sun-bcla-sos sun-wsdp-bin vhfPL vpython That was the newest list. I checked the list for sanity, and nothing in the tree uses any of these, so I removed them all. Thanks.
Recycling... Adobe-X, CID, DEC, DEC-2, IBM-X, NVIDIA-X, NetBSD, SGI, UCB-LBL, XC-2, bigelow-holmes-urw-gmbh-luxi, christopher-g-demetriou, docbook, national-semiconductor, nokia, sun-bcla-j2eeeditor, sun-bcla-javac, sun-javac, sun-resolver, tektronix, the-open-group, todd-c-miller, truecrypt-2.0, x-truetype, xfree86-1.0
Removed all sun-* and docbook.
All remaining licenses have been removed already. Closing.