Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 500374 - find-binary-files.txt at http://qa-reports.gentoo.org/ produces invalid entries, such as for x11-terms/hanterm/files/Hanterm.gentoo
Summary: find-binary-files.txt at http://qa-reports.gentoo.org/ produces invalid entri...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal QA (vote)
Assignee: Gentoo Quality Assurance Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: binaries-in-git
  Show dependency tree
 
Reported: 2014-02-05 13:36 UTC by Michael Palimaka (kensington)
Modified: 2014-02-09 13:50 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Palimaka (kensington) gentoo-dev 2014-02-05 13:36:05 UTC
binary file: /usr/portage/x11-terms/hanterm/files/Hanterm.gentoo

Binary files like pngs should be distributed via mirrors. CVS (and in the
future, git) don't handle binary files well.
Comment 1 Samuli Suominen (RETIRED) gentoo-dev 2014-02-05 13:59:52 UTC
not a binary...

http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/x11-terms/hanterm/files/Hanterm.gentoo?view=markup

however it's reported falsely here...

http://qa-reports.gentoo.org/output/find-binary-files.txt

seems like the script that's generating find-binary-files.txt should be fixed?
Comment 2 Michael Palimaka (kensington) gentoo-dev 2014-02-05 14:01:56 UTC
ah, that is text but I don't have the correct character set available?
Comment 3 Samuli Suominen (RETIRED) gentoo-dev 2014-02-05 14:37:11 UTC
I don't know enough of CJK to verify, but I guess it's UTF-16 or something?
I guess we could then argue that anything non-UTF-8 should be mirrored too, and then the checking script would be... 'right'
Comment 4 Ulrich Müller gentoo-dev 2014-02-05 16:03:53 UTC
This is a text file in some variant of Extended Unix Code, see http://en.wikipedia.org/wiki/Extended_Unix_Code

Opening the file in Emacs (with EUR-KR encoding) gives this for the first few lines:

   !!!!! 3.1.6에서 추가된 옵션

   ! 한글 자판 종류. 기본값은 2.
   ! 2 : 두벌식  391 또는 3FINAL : 세벌식 최종  3 또는 390 : 세벌식 390
   !Hanterm*hangulKeyboard: 3FINAL

   ! 한글 코드 표시. 기본값은 true.
   !Hanterm*showCodeStatus: false

   ! 한글 자판 종류 표시할 것인지. 기본값은 true.
   !Hanterm*showHanKbdLayout: false

Closing.
Comment 5 Samuli Suominen (RETIRED) gentoo-dev 2014-02-05 16:22:25 UTC
(In reply to Ulrich Müller from comment #4)
> Closing.

You think it's good idea to do nothing and have no bug open for the false report in http://qa-reports.gentoo.org/output/find-binary-files.txt ?
I don't, otherwise we'd be collecting duplicates forever
Comment 6 Ulrich Müller gentoo-dev 2014-02-05 16:35:13 UTC
(In reply to Samuli Suominen from comment #5)
> You think it's good idea to do nothing and have no bug open for the false
> report in http://qa-reports.gentoo.org/output/find-binary-files.txt ?

This bug's summary was about Hanterm.gentoo being a binary file, and that certainly was invalid.

> I don't, otherwise we'd be collecting duplicates forever

The script is here:
http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=blob;f=find-binary-files.pl;h=0d32c9e1b3fd2004a297c71f6df210674c0b4785;hb=HEAD

However, I wonder why x11-terms/hanterm/files/Hanterm.gentoo suddenly shows up. It definitely wasn't there last time I had checked the report, and neither the file nor the find-binary-files.pl script have changed for a long time.

@Infra: Was there an update of perl on the machine running the script?
Comment 7 Christian Ruppert (idl0r) gentoo-dev 2014-02-05 19:06:24 UTC
(In reply to Ulrich Müller from comment #6)
> (In reply to Samuli Suominen from comment #5)
> > You think it's good idea to do nothing and have no bug open for the false
> > report in http://qa-reports.gentoo.org/output/find-binary-files.txt ?
> 
> This bug's summary was about Hanterm.gentoo being a binary file, and that
> certainly was invalid.
> 
> > I don't, otherwise we'd be collecting duplicates forever
> 
> The script is here:
> http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=blob;f=find-
> binary-files.pl;h=0d32c9e1b3fd2004a297c71f6df210674c0b4785;hb=HEAD
> 
> However, I wonder why x11-terms/hanterm/files/Hanterm.gentoo suddenly shows
> up. It definitely wasn't there last time I had checked the report, and
> neither the file nor the find-binary-files.pl script have changed for a long
> time.
> 
> @Infra: Was there an update of perl on the machine running the script?

Yes, we upgraded to 5.16.3 recently.
Comment 8 Ulrich Müller gentoo-dev 2014-02-06 08:04:34 UTC
I wonder if we shouldn't change the heuristic to something based on file/magic. Scanning the tree with "file -i" gives this:

      1 application/octet-stream; charset=binary
      2 application/vnd.iccprofile; charset=utf-8
      1 application/vnd.ms-fontobject; charset=us-ascii
      2 application/x-elc; charset=us-ascii
  17023 application/xml; charset=us-ascii
    578 application/xml; charset=utf-8
      7 binary; charset=binary
      4 image/svg+xml; charset=us-ascii
     20 image/x-xpmi; charset=us-ascii
      2 inode/x-empty; charset=binary
     19 text/html; charset=us-ascii
      5 text/plain; charset=iso-8859-1
      1 text/plain; charset=unknown-8bit
  67125 text/plain; charset=us-ascii
  11582 text/plain; charset=utf-8
      1 text/troff; charset=iso-8859-1
     32 text/troff; charset=us-ascii
     14 text/x-c++; charset=us-ascii
      1 text/x-c++; charset=utf-8
     94 text/x-c; charset=us-ascii
      4 text/x-c; charset=utf-8
     52 text/x-diff; charset=iso-8859-1
      3 text/x-diff; charset=unknown-8bit
  15771 text/x-diff; charset=us-ascii
    280 text/x-diff; charset=utf-8
      6 text/x-fortran; charset=us-ascii
    242 text/x-lisp; charset=us-ascii
      2 text/x-lisp; charset=utf-8
      9 text/x-m4; charset=us-ascii
    117 text/x-makefile; charset=us-ascii
      3 text/x-makefile; charset=utf-8
     64 text/x-pascal; charset=us-ascii
     12 text/x-pascal; charset=utf-8
     11 text/x-perl; charset=us-ascii
      2 text/x-php; charset=us-ascii
      1 text/x-po; charset=utf-8
     32 text/x-python; charset=us-ascii
      1 text/x-python; charset=utf-8
      2 text/x-ruby; charset=us-ascii
      1 text/x-ruby; charset=utf-8
      3 text/x-shellscript; charset=iso-8859-1
    487 text/x-shellscript; charset=us-ascii
      6 text/x-shellscript; charset=utf-8

text/* are good, also application/* with us-ascii or utf-8 as charset.
If we want to tolerate xpm and svg images in the tree can be discussed.
Comment 9 Ulrich Müller gentoo-dev 2014-02-06 08:35:12 UTC
Further filtering the file -i output for patches of GNU Info files (which should be easy to implement) would leave us with the following binary files being reported:

   app-misc/linux-logo/files/gentoo.logo
   dev-util/deskzilla/files/deskzilla_gentoo.license
   sci-biology/readseq/files/19930201-impl-dec.patch
   sys-apps/sed/files/unix2dos

Not bad, it doesn't include the CJK false positives any more.
Comment 10 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2014-02-08 15:15:05 UTC
just do it.
Comment 11 Ulrich Müller gentoo-dev 2014-02-08 16:33:25 UTC
(In reply to Jeremy Olexa (darkside) from comment #10)
> just do it.

Committed to qa-scripts repo:
http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=commit;h=9b2496f2a230e8f29feda04e141ccd574858cb24

@infra: Can you please update cron to run find-binary-files.sh instead of find-binary-files.pl?
Comment 12 Ulrich Müller gentoo-dev 2014-02-09 13:50:48 UTC
All set up. Closing.