Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 456892

Summary: Missing formal specification of ebuild naming policy
Product: Documentation Reporter: Walter <walter>
Component: [OLD] Portage DocumentationAssignee: Package Manager Specification <pms>
Status: RESOLVED INVALID    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Walter 2013-02-12 04:30:58 UTC
The description at http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&chap=1#doc_chap3 is a good start, however a versioned, formal form would be superior.

I hit this while developing a container-based build system that needs to validate inbound package version numbers.

Wound up writing a sorta-there-but-not-really regex.

Reproducible: Always

Steps to Reproduce:
1. Want formal spec

Actual Results:  
Spec is missing.

Expected Results:  
Spec comes to daddy.
Comment 1 Zac Medico gentoo-dev 2013-02-12 04:36:36 UTC
Everthing should be in app-doc/pms.
Comment 2 Ulrich Müller gentoo-dev 2013-02-12 06:59:06 UTC
Is this bug about the formal spec (PMS), or about policy (devmanual)? The latter imposes additional restrictions, like "uppercase characters are strongly discouraged" or "no integer part of the version may be longer than 18 digits".

PMS: http://dev.gentoo.org/~ulm/pms/5/pms.html#x1-160003
devmanual: http://devmanual.gentoo.org/ebuild-writing/file-format/index.html
Comment 3 Walter 2013-02-12 23:57:15 UTC
As a general long-term gentoo user, I wasn't even aware the app-doc/pms project existed. Maybe someone should look at putting that in the aforementioned handbook link?

If the two differ, then that's ... well, it's probably best viewed as another separate bug, unless it's documented and explained clearly.

I seek only a formal spec, where I can go "is this a valid X?" (where X is package atom, version string, etc.) and system goes yay or nay. That is all.

Right now, it sounds like there's at least two standards (and the book version suggests some packages in portage actually don't conform, so make that three or more standards).

Apparently, none of these have a formal, machine-parseable spec I can point at and go "yes, that's the truth version X, to which things should conform". That's all I need. This bug is about the fact that bit's missing, not the other problems.
Comment 4 Walter 2013-02-13 00:48:01 UTC
Related food for thought (just to give you some idea of the approach).. I just did some testing with various variants of the version string and found suddenly different SRC_URI interpretation requirements vs. package version string in the event that the '-r0' suffix was used. (ie. I can name my tarball the same version but not if I am using -r# .. and maybe .. untested as yet ... _pre ...)

This sort of thing should optimally be explicitly formally, machine-readably defined somewhere, not just in text.
Comment 5 Jaroslav Rakhmatoullin 2013-04-29 05:38:41 UTC
It seems formal enough. If you are left with [a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+, 
after checking with that sed script, then the stripped version could be valid. This is at least enough to check that there is a folder (and also the .ebuild version). eg:

[[ -d /usr/portage/`stripVersion dev-php/PEAR-MDB2_Driver_mysql-1.5.0_beta3` ]] && echo could be still valid

With some effort, I can probably think of a couple examples where this would 
shave off parts of the package name, but last I checked (whole tree), there
were no such packages. So perhaps what you're missing is a clear syntactically
unambiguous representation or the relatively rigorous naming specification (convention)?
 
Here is how I interpret the naming convention:

# Naming policy
# http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&chap=1
#
# atom. pkg ver _suf
#  
# pkg. the package name, which should only contain lowercase letters,
# the digits 0-9, and any number of single hyphen (-), underscore (_)
# or plus (+) characters.

# ver. The version is normally made up of two or three (or more) numbers
# separated by periods, such as 1.2 or 4.5.2, and may have a single letter
# immediately following the last digit; e.g., 1.4b or 2.6h.
# The package version is joined to the package name with a hyphen

# _suf{#}.  #.#.# < _alpha < _beta < _pre < _rc < (no suffix) < _p.

function stripVersion {
    package=`echo $1 | sed -E '
    s/_(alpha|beta|pre|rc|p)?([0-9]*)?$//;
    s/-r([0-9]*)?$//;
    s/_(alpha|beta|pre|rc|p)?([0-9]*)?$//;
    s/-[0-9.]*([a-z]*)?$//;
    s/^(=)?([<>])?(=)?//
    '`
    echo $package
}

if [ -f "$1" ] ; then
    while read atom; do
        stripVersion "$atom"
    done < "$1"
  else
        stripVersion "$1"
fi
Comment 6 Walter 2013-04-29 06:52:12 UTC
Your interpretation seems great but it's still just that - an interpretation - of one of three sources of truth identified (PMS docs, portage codebase, actual tree). Without a version number.

It's close though.

What would be really ideal would be taking the package name format interpretation you have made, converting it so that it was defined declaratively (eg. using IETF standard ABNF from https://tools.ietf.org/html/rfc5234) and publishing the result as the standard.

That way its interpretation is fixed and it can be viably used for code generation in arbitarary languages.

In addition, versioning would make changes clear and insta-documentable (and on that note: for the love of god, please use github instead of some weird private gentoo devs only repo. Right now the friction involved in contributing changes to some areas of gentoo is ridiculous. It would be a great chance to set a good example with this standards effort!)

Thanks for your further consideration of this real issue. It's areas like this that can be very hard to address with open source efforts... but total domination is within reach!
Comment 7 Ulrich Müller gentoo-dev 2013-04-29 07:21:27 UTC
The formal spec is here:
http://dev.gentoo.org/~ulm/pms/head/pms.html#x1-160003

Nothing to fix here, therefore closing.
Comment 8 Walter 2013-04-29 08:31:27 UTC
That page is neither machine parseable nor concise. It describes simultaneously both algorithms (in multiple) and the identifier itself. It is very measurably not a good, versioned, machine-useful specification for the identifier.

But if you choose to ignore its flaws, ignore its flaws. Sometimes this community is so hard to contribute to.
Comment 9 Walter 2013-04-29 08:47:23 UTC
Restatement of problem: "I seek only a formal spec, where I can go "is this a valid X?" (where X is package atom, version string, etc.) and system goes yay or nay. That is all."

Problem NOT solved, problem VALID.
Comment 10 Ulrich Müller gentoo-dev 2013-04-29 09:06:51 UTC
This boils down to the general question if PMS should use EBNF when describing syntax. Which needs to be discussed on mailing lists, not in a bug report.

But in fact, it _was_ discussed on gentoo-dev several times, as early as 2008.
Comment 11 Walter 2013-04-29 09:10:07 UTC
It boils down to not having a formal (== rigidly defined, machine parseable) spec for the most basic element of the system even after over a decade of development.  As a normal Gentoo user I had no idea about PMS. I do not think this bug should be considered PMS-centric. However, PMS could be the right way to solve it.  Unfortunately, it looks like you are saying PMS has chosen a non machine parseable solution written in elevated technical English by a series of Germans. Honestly, if I was a second language speaker I'd just give up.  As it happens I'm part German so I can poke fun all I like :)

Really, whatever is going on right now with whatever Gentoo projects, and whatever people feel about the matter having been touched upon in the past, this bug is valid and should be resolved. It should remain open until it is resolved. It is not resolved.
Comment 12 Ciaran McCreesh 2013-04-29 12:05:13 UTC
There's a reason you don't find many machine-parseable specs out there...
Comment 13 Walter 2013-04-30 01:41:41 UTC
I will have a crack at converting the interpretation here to ABNF and post the results.
Comment 14 Walter 2013-07-07 06:58:22 UTC
I have invested some time and implemented an ABNF spec for these items. Right now there is a bug where capitalization is screwy because I haven't bothered to go through and change every string literal like "A" to %xx  ... but you get the gist.

https://github.com/globalcitizen/gentoo-pms-abnf

It would be nice if this was picked up and used for testing and formal specification.

Sample output:
[repo-name]
'0_Xywgh_V6Zve_Kx-F_f_W9Piv3COJ_tpgf_-U-_Adu__5k2cno-__-a__7__-sw4-IGI-8_-_j-P1H-h_67_6Ta------'	'1iZ__piy'	'7-U_Ioml9-----''_R9juGk5_1_GJ780F6J_L'	'5U_KW_tSPyHi_Q'	'c_Z2xoi'	'KS_4k_Gg3O_I__f6lz'	'a-------------'	'b'	'g'	'5cQ-j-_-_A37_92_CQ_dW_1__rP_0__6VK_8_4-Am_-dH36-5_-_9__-av-R_JN-_-_2_-my-u_h-----------'	'o-_-------'	'v-7_pQ4x_v_o_Q_2s_-rp93_TqjlH1-u86j-_Jw5-Z-V-eP0X-_-_-__-7_SYc4-_o-Kn_ya1-_-_Gd-__m-_-9e_R-_-__3-_B_85---------'	'M--------'	'U_c6GcA4q1aZ_pd0--------'	'_gd823H6A9_qg__OW--------------------'	'0-ASb_CUWu7_BHEV_z-8f-EK-_n_y-__-M-O-_Y-guN-mJ-bL-_-_sI3-M-15KG6_k-U-aBF-4-ZX-'	'fb4_-kHe1_H__tG0569_8U-'	'0'	'vV_2__-t_X6cRC_4E1x_7_u_3t-9-v-__-j-8-wq-uNs-__50-0-Y-cr-_op-_UG-n-E-x_I-U7_-Y_-_3-Q_0-'	'_'	'B-P--------------'

[version]
'4.5213078_BETA6'	'78352461905'	'7a_RC'	'750931842695931_beta6'	'2_alpha'	'4.9278350613965.0J'	'01394275.86088784.3.751.043.4.7.0.5.15.47.95m'	'138954267061395348I'	'256390478180932078.5.70.678.66.6.57.2502.89.19138326.20.8.12.4.998.024757A_alpha'	'3.750269814709922216.0.23.88.7.0.7.5770.27.99.9.10.5g'	'91862734057.532931339.9.56057.42.1.39.23.2.37.5.3.9359.751.089493'	'09258317644.1.49608709589952725.8063.78.5639.2.01.302.5.4.60.5.232224.0.158334.98.59.08.5'	'8.1.03624.5.978.3.73526.1.663E_pre'	'35721604897738.4.668328019.07.1.148.88.05.821.99.1.7170.384.2_beta7-r3'	'2.49751.8.630.9X_alpha'	'321540968732219878.6679720715.9.5.9109S''0U'	'2.19307.6.8.543528.943.1.5.124.5.74.99.5.05.679.67.72.47.4D_BETA'	'210637549862619203881.3.5135198750.9.9.60764.27.767124.67.01.7.0.2.2.4.02.8.40.387.754c'	'294160583738309782164.59185721.4.98.8.2.1.86254.518.2.124.905.843_beta0-r3'	'23869574013.0.009.835.2.5.9173.8.6.2638.4_beta'	'7.5964018322.5.2.12.2648.12890572.27.89616.7.7.94.3.6.1.0.36.7.53991.0391.366C_BETA'
Comment 15 Walter 2013-09-23 07:19:57 UTC
Same link now includes code to generate regex from the ABNF.