Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 133978 - openjade does not process more than the first 8192 bytes of a file
Summary: openjade does not process more than the first 8192 bytes of a file
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Text-Markup Team (OBSOLETE)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-21 13:40 UTC by Gregor Mückl
Modified: 2006-07-05 02:03 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Example SGML input and text output as processed with sgml2txt (manual.tar.gz,5.29 KB, application/x-tgz)
2006-07-04 02:09 UTC, Gregor Mückl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gregor Mückl 2006-05-21 13:40:33 UTC
The openjade SGML processor is broken and has been broken for quite a while now (at lest 3 months by now). I have seen that happen on 3 independent installations of Gentoo, all x86, either stable and testing.

It stops reading the input file after *exactly* 8192 bytes (8k). Everything that comes after that mark in the input file is not processed. 

Versions of openjade included in other distributions work fine with the same input.
Comment 1 Leonardo Boshell (RETIRED) gentoo-dev 2006-05-22 23:24:59 UTC
Could you provide more information?

Any test case? Steps to reproduce?
What is affected by this?
How did you come up the 8k figure?

If you are seeing a problem when processing SGML files, the cause could be from a number of sources (opensp, openjade, DTDs and more), but without even the output from a program it's impossible to know what you're talking about.
Comment 2 Gregor Mückl 2006-05-23 00:42:54 UTC
How I got that 8k figure? By looking for the last character that was output in the source file. The output ends in mid-word *immediately* after the 8192. byte in the input file. Check it with a hex editor of your choice. It does not depend on the file contents. This is no feat any stylesheet or DTD can accomplish. I have checked that very thoroughly.

Steps to reproduce: run sgml2txt on an arbitrary SGML docbook file. The errors in the openjade output because of that truncation are so severe that the tex code produced by openjade when running sgml2pdf cannot be typeset by LaTeX.

I've looked at the SGMLTools source and verified that the bug simply *cannot* be in SGMLTools. The only processing stage all backends have in common is openjade. So this is the program with the errornous behaviour.

My test files are processed correctly by SMGLTools/OpenJade from other distributions. So these can safely be assumed to be correct.
Comment 3 Leonardo Boshell (RETIRED) gentoo-dev 2006-05-23 15:13:36 UTC
Thanks for the feedback, however it isn't clear to me exactly what kind of problem you're experiencing yet.

Openjade is used regularly in processing manuals for packages like libgnomeprint, and it works as intended in those cases. I would like to try the steps you mention, but sgml2txt is intended for documents following the linuxdoc DTD, not exactly docbook files. Maybe you mean sgmltools?

Please attach the sample file you're using, and the complete output from the command you're running to process the file so we can have a better idea as to what exactly is going wrong.

Thanks.
Comment 4 Leonardo Boshell (RETIRED) gentoo-dev 2006-07-03 15:37:19 UTC
No further feedback from the reporter.
Comment 5 Gregor Mückl 2006-07-04 02:09:16 UTC
Created attachment 90833 [details]
Example SGML input and text output as processed with sgml2txt

Attached are two files: the XML input used and the text output produced by sgml2txt. sgml2pdf and other tools end the text at the exactly same position.

Please also try the following:

1. Open manual.xml with a text editor, look for the last wort that was written to the txt file, note its offset in the file.
2. Remove (add) text from (to) the manual.xml file, rerun sgml2txt, repeat step 1. Note that the offset has not changed.

As I already said, the very same file works perfectly on other distributions, esp. on Kubuntu 6.06. However, on any of a bunch of different Gentoo boxes I have access to, this does not work and has not been working for months, although all these Gentoo machines are updated regularly.
Comment 6 Gregor Mückl 2006-07-04 02:09:52 UTC
Reopened because it still exists.
Comment 7 Leonardo Boshell (RETIRED) gentoo-dev 2006-07-04 16:45:54 UTC
According to the attachment you provided, I see there's some kind of confusion as to which DocBook toolchain should be used.

The file manual.xml is an XML document, following the DocBook 4.3 DTD according to the DOCTYPE directive (which shouldn't be commented). In order to produce a txt file from that, I'd suggest you use 'xmlto'. You'll have to make sure you have the DocBook 4.3 XML DTD too:

   emerge ~docbook-xml-dtd-4.3 xmlto

Then run:

  xmlto txt manual.xml

As I mentioned in comment #3, sgml2txt is a tool intended to process linuxdoc documents only. The DTD is somewhat similar to DocBook, so it's easy to confuse the tools, and the generic name 'sgml2txt' doesn't help (furthermore, sgmltools-lite used to provide a wrapper with the same name, which was a bug, and makes all of this even more confusing).
Comment 8 Gregor Mückl 2006-07-05 02:03:51 UTC
FYI, the sgml2txt in question is from sgmltools-lite in all cases and Gentoo is the only distribution where the sgmltools-lite suite will not process the file correctly. sgmltools-lite on other distributions, esp. Ubuntu processes the file as given correctly.

I think you are thinking along the wrong lines here: I am trying to report something that is obviously a misbehaving program whereas you tell me I am using the wrong tools. Please check the file as given and report what output sgmltools-lite produce. Thank you.