Summary: | app-text/pdfshuffler : will not allow me to export files which it has opened into a .pdf. once I click 'save' the box just hangs indefinitely. | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | greenbean127 |
Component: | Current packages | Assignee: | No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it <maintainer-needed> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | ||
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
greenbean127
2011-08-24 02:37:02 UTC
Any debug messages in the terminal ? (In reply to comment #1) > Any debug messages in the terminal ? yes there are, I should have provided an example in the bug post sorry. Here you go Traceback (most recent call last): File "/usr/bin/pdfshuffler", line 417, in choose_export_pdf_name self.export_to_file(file_out) File "/usr/bin/pdfshuffler", line 438, in export_to_file pdfdoc_inp = PdfFileReader(file(pdfdoc.copyname, 'rb')) File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__ self.read(stream) File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 751, in read offset, generation = line[:16].split(" ") ValueError: too many values to unpack I was able to find this bug reproduced on both redhat and debian bug reporting sites. On the Debian site, someone pointed out that the bug only seems to happen with pdf's created by simplescan. I was able to reproduce this on my system. If I open multiple pdf's that were not created by simple scan, then I am able to export the pages and create a new .pdf document by clicking 'export' and then 'save'. It should be noted that the pdf files I used with success in pdfshuffler were not created on my system at all. I was able to reproduce the problem again in pdfshuffler by trying to merge and export pdf files that I did create myself using simples-scan. Here is the output from the terminal: Traceback (most recent call last): File "/usr/bin/pdfshuffler", line 417, in choose_export_pdf_name self.export_to_file(file_out) File "/usr/bin/pdfshuffler", line 438, in export_to_file pdfdoc_inp = PdfFileReader(file(pdfdoc.copyname, 'rb')) File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__ self.read(stream) File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 751, in read offset, generation = line[:16].split(" ") ValueError: too many values to unpack I hope this helps. Thanks again :) The problem certain has to do with simple-scan created .pdf's. I just scanned 2 documents using xsane, imported them to pdfshuffler and manipulated them. I was able to save the two documents as a single document as expected. There appears to be something in the way that simple-scan is creating pdf's that pdfshuffle does not like. I am not sure how to proceed from here as this is my first bug filing. Thanks. Well, you may argue that it's pyPDF that's broken, but actually Simple Scan produces corrupted pdfs. It took some googling, but the result was: PDF Reference, section 3.4.3 ... Following this line are the cross-reference entries themselves, one per line. Each entry is exactly 20 bytes long, including the end-of-line marker. The format of an in-use entry is as follows: nnnnnnnnnn ggggg n eol where nnnnnnnnnn is a 10-digit byte offset ggggg is a 5-digit generation number n is a literal keyword identifying this as an in-use entry eol is a 2-character end-of-line sequence ... The cross-reference entry for a free object has essentially the same format, except that the keyword is f instead of n and the interpretation of the first item is different: nnnnnnnnnn ggggg f eol where nnnnnnnnnn is the 10-digit object number of the next free object ggggg is a 5-digit generation number f is a literal keyword identifying this as a free entry eol is a 2-character end-of-line sequence ... If the file’s end-of-line marker is a single char- acter (either a carriage return or a line feed), it is preceded by a single space; if the marker is 2 characters (both a carriage return and a line feed), it is not preceded by a space. Well, many pdf files seem to forget the space, if they use just '\n', but what Simple Scan puts in is: "%010zu 0000 n\n".printf (offset) so, not only it misses the space (pyPDF has a workaround for this - actually, given the above entry, it's not really correct), but the generation number is one digit short - this is what causes the failure. (In reply to comment #5) > Well, you may argue that it's pyPDF that's broken, but actually Simple Scan > produces corrupted pdfs. > It took some googling, but the result was: > PDF Reference, section 3.4.3 > ... > Following this line are the cross-reference entries themselves, one per line. > Each entry is exactly 20 bytes long, including the end-of-line marker. > The format of an in-use entry is as follows: > nnnnnnnnnn ggggg n eol > > where > nnnnnnnnnn is a 10-digit byte offset > ggggg is a 5-digit generation number > n is a literal keyword identifying this as an in-use entry > eol is a 2-character end-of-line sequence > > ... > The cross-reference entry for a free object has essentially the same format, > except that the keyword is f instead of n and the interpretation of the first > item is different: > nnnnnnnnnn ggggg f eol > > where > nnnnnnnnnn is the 10-digit object number of the next free object > ggggg is a 5-digit generation number > f is a literal keyword identifying this as a free entry > eol is a 2-character end-of-line sequence > ... > If the file’s end-of-line marker is a single char- > acter (either a carriage return or a line feed), it is preceded by a single > space; if the > marker is 2 characters (both a carriage return and a line feed), it is not > preceded > by a space. > > Well, many pdf files seem to forget the space, if they use just '\n', > but what Simple Scan puts in is: > "%010zu 0000 n\n".printf (offset) > so, not only it misses the space (pyPDF has a workaround for this - actually, > given the above entry, it's not really correct), > but the generation number is one digit short - this is what causes the failure. Thats interesting. Thanks for your help. This is my first time participating in the bug process, should I file a bug with pyPDF then? or this bug sufficient? I am sure we would want the issue resolved right? Thanks again for all your help. Its been a good learning experience. Well, it's a bit complicated. IMHO: - pdfshuffler might use "try...except..." block around 'pdfdoc_inp = PdfFileReader(file(pdfdoc.copyname, 'rb'))' line to drop broken files out of queue - pyPDF should do nothing - it's not supposed to handle arbitrarily corrupted files - simple-scan needs to be fix upstream - I suspect https://bugs.launchpad.net/simple-scan/+bug/662144 is an old report regarding this very problem Due to the number of packages involved, I'm just CCing, instead of assigning. I've managed to reach author of simple-scan and its new release will no longer produce broken pdfs and have an option to fix the broken files generated by the older versions. After all, fix both for the app and the files is trivial. I guess this is solved now with newer simple-scan versions... |