Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 571750 - sys-apps/busybox: busybox-1.24.1-unzip.patch: 0x12 control character confuses file(1) in QA script
Summary: sys-apps/busybox: busybox-1.24.1-unzip.patch: 0x12 control character confuses...
Status: RESOLVED CANTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal QA (vote)
Assignee: Embedded Gentoo Team
URL: https://qa-reports.gentoo.org/output/...
Whiteboard:
Keywords:
Depends on:
Blocks: binaries-in-git
  Show dependency tree
 
Reported: 2016-01-13 14:17 UTC by Ulrich Müller
Modified: 2016-01-14 17:11 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Müller gentoo-dev 2016-01-13 14:17:05 UTC
As reported, see URL:
sys-apps/busybox/files/busybox-1.24.1-unzip.patch: application/octet-stream; charset=binary (size=3903)
sys-apps/busybox/files/busybox-1.24.1-unzip-regression.patch: application/octet-stream; charset=binary (size=4383)

In particular, line 89 of busybox-1.24.1-unzip.patch contains binary garbage.

Policy reference:
https://devmanual.gentoo.org/general-concepts/tree/index.html#what-belongs-in-the-tree%3F
Comment 1 SpanKY gentoo-dev 2016-01-13 15:38:33 UTC
read the files and you'll see they're obviously correct
Comment 2 Ulrich Müller gentoo-dev 2016-01-13 21:01:32 UTC
This is not about the files being "correct". The point is that file(1) reports them as non-text files:

$ file -i busybox-1.24.1-unzip*.patch
busybox-1.24.1-unzip.patch:            application/octet-stream; charset=binary
busybox-1.24.1-unzip-regression.patch: application/octet-stream; charset=binary

This is because they contain lines like:
+  inflating: ]3j½r«IK-%Ix

It is pointless to argue if this is a false positive of the QA script or not. It however clutters said script's output, and fixing it appears to be trivial, by quoting the binary chars with $'' in the offending line:

+  inflating: "$']3j\xc2\xbdr\xc2\xabI\x1b\x12K-%Ix'"
Comment 3 SpanKY gentoo-dev 2016-01-14 00:05:39 UTC
(In reply to Ulrich Müller from comment #2)

you are the one that attempted to reference a policy that does not apply -- these files are not binary images as in png/etc...  these are perfectly valid patches.  `git format-patch` produced them and `patch` has no problem applying them.  they are perfectly valid UTF-8 encoded.  this is not "binary garbage".

what you're now attempting to do, which is completely new, is reject any file in the tree that `file -i` detects as binary.  it flagged these because it uses ^[^R or 0x1b 0x12 or ESC DC2.

i'm not going to make bogus changes using non-portable shell logic in upstream code bases to appease a broken check.  this runs against common sense.
Comment 4 Ulrich Müller gentoo-dev 2016-01-14 03:51:52 UTC
The UTF-8 encoding isn't the problem, but the strange control characters make file(1) believe that this is binary. ESC would even be fine, but it barfs on DC2.

Obviously file (or libmagic) must use some heuristic or another, and it chooses to reject any control chars other than 0x07-0x0d (BEL, BS, HT, LF, VT, FF, CR) and 0x1b (ESC). And I don't think that working around this in the QA script would be a good idea, because it could miss real binaries then.
Comment 5 SpanKY gentoo-dev 2016-01-14 12:44:43 UTC
i'm not changing the files to satisfy false positives.  this is asinine.
Comment 6 Ulrich Müller gentoo-dev 2016-01-14 13:42:21 UTC
Yeah, I guess for any such heuristic approach we must live with the fact that it cannot be perfect and will therefore have false positives and false negatives.
Comment 7 SpanKY gentoo-dev 2016-01-14 16:54:03 UTC
does the system not have a rolling whitelist or something so the output isn't cluttered w/noise ?
Comment 8 Ulrich Müller gentoo-dev 2016-01-14 17:11:31 UTC
(In reply to SpanKY from comment #7)
> does the system not have a rolling whitelist or something so the output
> isn't cluttered w/noise ?

That wasn't necessary so far. Current list is only 8 files: 3 files which are binaries without any doubt, 3 empty files, and the 2 files from this bug.