Description
charles17
2017-07-24 08:23:52 UTC
The reason is that those templates from https://gitweb.gentoo.org/proj/devmanual.git/tree/devbook.xsl#n27 are not suitable if section titles or resp. subsection titles are not unique. Created attachment 574416 [details, diff]
combined-parent-anchors.patch
Here's a proof-of-concept patch to devmanual.xsl that makes the IDs/anchors a little more unique, by combining e.g. the parent section's name into each subsection name. Instead of five subsections with id="helpers", we now have
* id="eapi=2-helpers"
* id="eapi=4-helpers"
* id="eapi=5-helpers"
* id="eapi=6-helpers"
* id="eapi=7-helpers"
I haven't tested it on any other page, and it probably breaks in some cases, and definitely breaks existing links to some subsections. But, it shows that maybe this can be fixed.
Created attachment 574552 [details, diff]
0001-devbook.xsl-disambiguate-auto-generated-header-ident.patch
Procrastinating real hard today. Here's a much better patch that uses the full path into the hierarchy as the identifier. It also adds a tidy-html5 check to the Makefile to ensure that this doesn't get screwed up again. Finally, I fix some other random issues that tidy-html5 pointed out.
Created attachment 574554 [details, diff]
0002-devbook.xsl-strip-leading-trailing-whitespace-from-h.patch
Created attachment 574556 [details, diff]
0003-Makefile-add-new-app-text-tidy-html5-sanity-check.patch
Created attachment 574558 [details, diff]
0004-keywording-change-ALLARCHES-subsection-to-a-section.patch
Created attachment 574560 [details, diff]
0005-appendices-contributing-devbook-guide-fix-missing-ur.patch
This has still only been lightly tested, but the fact that tidy-html5 now runs clean is reassuring. And three of the five commits are independently correct: * Trimming leading/trailing whitespace from anchors * Fixing the ALLARCHES header * Fixing the GuideXML <uri> elements The addition to the Makefile is generally harmless, but maybe there's a way to improve the ugly way I'm running tidy-html5. The only thing that needs real scrutiny is the overall effect of the new identifier-generation rules. (In reply to Michael Orlitzky from comment #3) > Created attachment 574552 [details, diff] [details, diff] > 0001-devbook.xsl-disambiguate-auto-generated-header-ident.patch > > Procrastinating real hard today. Here's a much better patch that uses the > full path into the hierarchy as the identifier. It also adds a tidy-html5 > check to the Makefile to ensure that this doesn't get screwed up again. > Finally, I fix some other random issues that tidy-html5 pointed out. Why would we want to do this? The path into the hierarchy is already present in the URL path, so it doesn't have to be repeated in the fragment identifier. Unless you like really long and redundant URLs like this: https://devmanual.gentoo.org/archs/amd64/index.html#arch-specific-notes----amd64/em64t--multilib-on-amd64--the-multilib-strict-feature In other words, in order to have unique ids, it is enough to start at the section level. (In reply to Michael Orlitzky from comment #7) > Created attachment 574560 [details, diff] [details, diff] > 0005-appendices-contributing-devbook-guide-fix-missing-ur.patch This should really be fixed in devbook.xsl. IMHO the example from the devbook-guide "<uri>https://forums.gentoo.org</uri> is my favorite web site." is perfectly valid and used to work in GuideXML. (In reply to Ulrich Müller from comment #9) > > Why would we want to do this? The path into the hierarchy is already present > in the URL path, so it doesn't have to be repeated in the fragment > identifier. Unless you like really long and redundant URLs like this: > https://devmanual.gentoo.org/archs/amd64/index.html#arch-specific-notes---- > amd64/em64t--multilib-on-amd64--the-multilib-strict-feature > > In other words, in order to have unique ids, it is enough to start at the > section level. Agreed that the URLs are horrendous. I can try that once your XML pull request is merged, and once the URI thing is sorted out. I was mainly convincing myself that this issue could actually be solved, here. Now it's just a matter of solving it in the best way possible. > > (In reply to Michael Orlitzky from comment #7) > > Created attachment 574560 [details, diff] [details, diff] [details, diff] > > 0005-appendices-contributing-devbook-guide-fix-missing-ur.patch > > This should really be fixed in devbook.xsl. IMHO the example from the > devbook-guide "<uri>https://forums.gentoo.org</uri> is my favorite web > site." is perfectly valid and used to work in GuideXML. Ok, done on bug 702180, but only lightly tested. (In reply to Michael Orlitzky from comment #10) ping (In reply to Ulrich Müller from comment #11) > (In reply to Michael Orlitzky from comment #10) > > ping It's still on my radar; after the XML validation I was waiting on the javascript search stuff. I won't delete the bugzilla email from my inbox until I've updated the patches. Created attachment 600638 [details, diff]
0001-devbook.xsl-disambiguate-auto-generated-header-ident.patch
Created attachment 600640 [details, diff]
0002-devbook.xsl-strip-leading-trailing-whitespace-from-h.patch
Created attachment 600642 [details, diff]
0003-Makefile-add-new-app-text-tidy-html5-sanity-check.patch
(In reply to Ulrich Müller from comment #9) > > Why would we want to do this? The path into the hierarchy is already present > in the URL path, so it doesn't have to be repeated in the fragment > identifier. Unless you like really long and redundant URLs like this: > https://devmanual.gentoo.org/archs/amd64/index.html#arch-specific-notes---- > amd64/em64t--multilib-on-amd64--the-multilib-strict-feature > > In other words, in order to have unique ids, it is enough to start at the > section level. > The new DTD says that multiple chapters are allowed on a single page, which I think makes it necessary to include the chapter information in the anchor as well. For example, <guide> <chapter> <title>Chapter 1</title> <section> <title>Introduction</title> </section> </chapter> <chapter> <title>Chapter 2</title> <section> <title>Introduction</title> </section> </chapter> </guide> would otherwise produce identical anchors for the two sections. (In reply to Michael Orlitzky from comment #16) > The new DTD says that multiple chapters are allowed on a single page, which > I think makes it necessary to include the chapter information in the anchor > as well. It is enough to start at the section level, because all chapters are in separate files. If the DTD says anything different then it should be fixed. (In reply to Ulrich Müller from comment #17) > If the DTD says anything different then it should be fixed. Done: https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=aa26b473992dd1df1417153b0b03cae6cfe28f8a Created attachment 600654 [details, diff]
0001-devbook.xsl-disambiguate-auto-generated-header-ident.patch
Created attachment 600658 [details, diff]
0002-devbook.xsl-strip-leading-trailing-whitespace-from-h.patch
Created attachment 600662 [details, diff]
0003-Makefile-add-new-app-text-tidy-html5-sanity-check.patch
Ok, starting from the section now.
(In reply to Michael Orlitzky from comment #19) > Created attachment 600654 [details, diff] [details, diff] > 0001-devbook.xsl-disambiguate-auto-generated-header-ident.patch Note to self: Convert tabs to spaces before merging. (I've unified indentation style to 2 spaces in https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=e9cfcb2d945a8379624467ef6f85fb7968db47fe, so we really shouldn't reintroduce tabs.) Any idea how to automatically update all internal cross references (and rewrapping to 80 chars if necessary)? The bug has been closed via the following commit(s): https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=2330779776b8ed53ec91629db2d2d56b65e64eb7 commit 2330779776b8ed53ec91629db2d2d56b65e64eb7 Author: Michael Orlitzky <mjo@gentoo.org> AuthorDate: 2019-04-28 16:24:51 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2020-01-02 12:43:21 +0000 Makefile: add new app-text/tidy-html5 sanity check. This new PHONY "make tidy" target runs the tidy-html5 program, using a new .tidyrc file, to ensure that the HTML we have generated is free from certain problems. In particular, it should complain if bug 626032 ever resurfaces and there are duplicate identifiers in some document. Closes: https://bugs.gentoo.org/626032 Signed-off-by: Michael Orlitzky <mjo@gentoo.org> [Command line options instead of .tidyrc file. Don't fail on first error.] Signed-off-by: Ulrich Müller <ulm@gentoo.org> Makefile | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) Additionally, it has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=b5bfc69fab686a49e6fbf54aceae2bb12885dc5a commit b5bfc69fab686a49e6fbf54aceae2bb12885dc5a Author: Michael Orlitzky <mjo@gentoo.org> AuthorDate: 2019-04-28 16:51:57 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2020-01-02 12:41:01 +0000 devbook.xsl: strip leading/trailing whitespace from header identifiers. We were already replacing spaces *within* these identifiers, so we don't have to worry about that. Bug: https://bugs.gentoo.org/626032 Signed-off-by: Michael Orlitzky <mjo@gentoo.org> Signed-off-by: Ulrich Müller <ulm@gentoo.org> devbook.xsl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Patches 2/3 and 3/3 merged. Reopening, because patch 3/3 doesn't resolve this (and I missed the "Closes" tag). (In reply to Ulrich Müller from comment #23) > Any idea how to automatically update all internal cross references (and > rewrapping to 80 chars if necessary)? I have an idea, but you're not going to like it. The W3C's link-checker (https://github.com/w3c/link-checker) can tell if an anchor link points to a dead-end. You'd have to install it locally (it's not packaged for Gentoo), hack it to run on file:// URLs, and kill the one-second minimum delay... but then you can grep out all of the broken anchors pretty quickly. There are other broken links, though. It probably makes more sense to go through and fix them all afterwards than it does to selectively ignore a bunch of stuff that is actually broken. The online checker (https://validator.w3.org/checklink) could be used for that. I've started looking at the internal links. IIUC, for a link to the "Suitable Download Hosts" subsection in the mirrors chapter, one would have to use the following as a link attribute now: "::general-concepts/mirrors#Automatic Mirroring--Suitable Download Hosts" I can't say that I find this syntax intuitive. Maybe we should just make section titles unique within each chapter. It affects only ebuild-writing/functions/src_compile (where arguably the examples could be in a list instead of subsections, or EAPI 0 could just be dropped) and ebuild-writing/eapi (where we could simply say "EAPI 2 Helpers" etc.). The bug has been closed via the following commit(s): https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=1343712b8137aa73e45df0a249f8bda303ed328a commit 1343712b8137aa73e45df0a249f8bda303ed328a Author: Ulrich Müller <ulm@gentoo.org> AuthorDate: 2020-01-04 18:21:32 +0000 Commit: Ulrich Müller <ulm@gentoo.org> CommitDate: 2020-01-04 18:21:32 +0000 Make (sub-)section titles unique within each chapter. Section, subsection, etc. titles are used to construct ID attributes, which must be unique within each chapter. One approach to disambiguate these identifiers would be to use the complete section hierarchy for constructing them (see bug 626032). However, that would break external references (and for example, links from bugzilla or from mailing list archives cannot be updated). Use a less intrusive approach instead and make the titles of the few ambiguous subsections unique, or convert them to a list. Closes: https://bugs.gentoo.org/626032 Signed-off-by: Ulrich Müller <ulm@gentoo.org> ebuild-writing/eapi/text.xml | 42 +++++++++++++-------------- ebuild-writing/functions/src_compile/text.xml | 36 ++++++++++++----------- 2 files changed, 40 insertions(+), 38 deletions(-) |