When using the -b flag to shash in order to create checksum digests for binary files rather then text files (which is significant when verifying the checksums on DOSish systems), shash works as expected. It fails however, to verify those checksums, complaining it could not find the source files. Which is nonsense. Reproducible: Always Steps to Reproduce: 1.echo "this is a binary file" > bb 2.shash -b bb > bb.DIGEST 3.shash -cV bb.DIGEST Actual Results: *bb: No such file. 0 of 0 file(s) failed MD5 check. Expected Results: bb: OK 0 of 1 file(s) failed MD5 check. I tracked the bug with gdb to somewhere around line 650 in file shash.c where parsing of the binary file indicator ("*") is attempted, but was not successful in fixing it.
I was using the following version of the program: shash v.0.9.9 (i686-pc-linux-gnu) (as emerged on 2007-06-18 with an up-to-date stable system)
The program incorrectly parses the "*" from the digest file as part of the file name, rather than as the "binary" indicator that it is. It then searches for "*filename", which it cannot find; thus triggering the error.
The problem is that shash uses sscanf to match the lines. First matching against "%s %s\n" will make the second %s include the * in the filename, as the " " does not match exactly two spaces, but anything between none and an infinite count of whitespace. Exchanging the order of the matching with * and without * makes shash correctly parse the lines for binary files, but we end up with an empty filename for regular files. But that issue looks like a bug in sscanf. The manpages mentions that * is special, but it shouldn't be special at the location it is placed in the format string.
Created attachment 122469 [details, diff] shash-0.2.6-binary-files.patch Patch to match explicit on the number of directives.
Thanks Sven - the patch works like a charm!
Unfortunately, it brings up another issue: When I studied the man page for sscanf I found in the section for %s: "Matches a sequence of non-white-space characters [...] The input string stops at white space or at the maximum field width, whichever occurs first." In other words: shash will never be able to parse DIGEST files containing whitespace in file names... and indeed, $ touch "text with ws" >"ws tt" $ touch "binary with ws" >"ws bb" $ shash -b "ws bb" >"ws bb.DIGEST" $ shash "ws tt" >"ws tt.DIGEST" $ shash -cV "ws tt" 0 of 0 file(s) failed MD5 check. $ shash -cV "ws bb" 0 of 0 file(s) failed MD5 check. shows that it won't. Shall I file another bug report for that? But as I understand it, this is an essential bug in mis-using sscanf, so it might be more appropriate to send a report to upstream, rather than trying to fix the bug as a patch. Except if there is an easy way to make %s consume whitespace as well.
I was a bit lazy with the last comment. The output actually is: $ shash -cV "ws bb.DIGEST" ws: No such file. 0 of 0 file(s) failed MD5 check. $ shash -cV "ws tt.DIGEST" ws: No such file. 0 of 0 file(s) failed MD5 check.
(In reply to comment #6) > Shall I file another bug report for that? > > But as I understand it, this is an essential bug in mis-using sscanf, so it > might be more appropriate to send a report to upstream, rather than trying to > fix the bug as a patch. > > Except if there is an easy way to make %s consume whitespace as well. Upstream seems to be quite dead, 0.2.6 has been released about 6 six years ago. And file names with whitespace are uncommon, so I think we ignore that bug for now.