binary file: /usr/portage/x11-terms/hanterm/files/Hanterm.gentoo Binary files like pngs should be distributed via mirrors. CVS (and in the future, git) don't handle binary files well.
not a binary... http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/x11-terms/hanterm/files/Hanterm.gentoo?view=markup however it's reported falsely here... http://qa-reports.gentoo.org/output/find-binary-files.txt seems like the script that's generating find-binary-files.txt should be fixed?
ah, that is text but I don't have the correct character set available?
I don't know enough of CJK to verify, but I guess it's UTF-16 or something? I guess we could then argue that anything non-UTF-8 should be mirrored too, and then the checking script would be... 'right'
This is a text file in some variant of Extended Unix Code, see http://en.wikipedia.org/wiki/Extended_Unix_Code Opening the file in Emacs (with EUR-KR encoding) gives this for the first few lines: !!!!! 3.1.6에서 추가된 옵션 ! 한글 자판 종류. 기본값은 2. ! 2 : 두벌식 391 또는 3FINAL : 세벌식 최종 3 또는 390 : 세벌식 390 !Hanterm*hangulKeyboard: 3FINAL ! 한글 코드 표시. 기본값은 true. !Hanterm*showCodeStatus: false ! 한글 자판 종류 표시할 것인지. 기본값은 true. !Hanterm*showHanKbdLayout: false Closing.
(In reply to Ulrich Müller from comment #4) > Closing. You think it's good idea to do nothing and have no bug open for the false report in http://qa-reports.gentoo.org/output/find-binary-files.txt ? I don't, otherwise we'd be collecting duplicates forever
(In reply to Samuli Suominen from comment #5) > You think it's good idea to do nothing and have no bug open for the false > report in http://qa-reports.gentoo.org/output/find-binary-files.txt ? This bug's summary was about Hanterm.gentoo being a binary file, and that certainly was invalid. > I don't, otherwise we'd be collecting duplicates forever The script is here: http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=blob;f=find-binary-files.pl;h=0d32c9e1b3fd2004a297c71f6df210674c0b4785;hb=HEAD However, I wonder why x11-terms/hanterm/files/Hanterm.gentoo suddenly shows up. It definitely wasn't there last time I had checked the report, and neither the file nor the find-binary-files.pl script have changed for a long time. @Infra: Was there an update of perl on the machine running the script?
(In reply to Ulrich Müller from comment #6) > (In reply to Samuli Suominen from comment #5) > > You think it's good idea to do nothing and have no bug open for the false > > report in http://qa-reports.gentoo.org/output/find-binary-files.txt ? > > This bug's summary was about Hanterm.gentoo being a binary file, and that > certainly was invalid. > > > I don't, otherwise we'd be collecting duplicates forever > > The script is here: > http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=blob;f=find- > binary-files.pl;h=0d32c9e1b3fd2004a297c71f6df210674c0b4785;hb=HEAD > > However, I wonder why x11-terms/hanterm/files/Hanterm.gentoo suddenly shows > up. It definitely wasn't there last time I had checked the report, and > neither the file nor the find-binary-files.pl script have changed for a long > time. > > @Infra: Was there an update of perl on the machine running the script? Yes, we upgraded to 5.16.3 recently.
I wonder if we shouldn't change the heuristic to something based on file/magic. Scanning the tree with "file -i" gives this: 1 application/octet-stream; charset=binary 2 application/vnd.iccprofile; charset=utf-8 1 application/vnd.ms-fontobject; charset=us-ascii 2 application/x-elc; charset=us-ascii 17023 application/xml; charset=us-ascii 578 application/xml; charset=utf-8 7 binary; charset=binary 4 image/svg+xml; charset=us-ascii 20 image/x-xpmi; charset=us-ascii 2 inode/x-empty; charset=binary 19 text/html; charset=us-ascii 5 text/plain; charset=iso-8859-1 1 text/plain; charset=unknown-8bit 67125 text/plain; charset=us-ascii 11582 text/plain; charset=utf-8 1 text/troff; charset=iso-8859-1 32 text/troff; charset=us-ascii 14 text/x-c++; charset=us-ascii 1 text/x-c++; charset=utf-8 94 text/x-c; charset=us-ascii 4 text/x-c; charset=utf-8 52 text/x-diff; charset=iso-8859-1 3 text/x-diff; charset=unknown-8bit 15771 text/x-diff; charset=us-ascii 280 text/x-diff; charset=utf-8 6 text/x-fortran; charset=us-ascii 242 text/x-lisp; charset=us-ascii 2 text/x-lisp; charset=utf-8 9 text/x-m4; charset=us-ascii 117 text/x-makefile; charset=us-ascii 3 text/x-makefile; charset=utf-8 64 text/x-pascal; charset=us-ascii 12 text/x-pascal; charset=utf-8 11 text/x-perl; charset=us-ascii 2 text/x-php; charset=us-ascii 1 text/x-po; charset=utf-8 32 text/x-python; charset=us-ascii 1 text/x-python; charset=utf-8 2 text/x-ruby; charset=us-ascii 1 text/x-ruby; charset=utf-8 3 text/x-shellscript; charset=iso-8859-1 487 text/x-shellscript; charset=us-ascii 6 text/x-shellscript; charset=utf-8 text/* are good, also application/* with us-ascii or utf-8 as charset. If we want to tolerate xpm and svg images in the tree can be discussed.
Further filtering the file -i output for patches of GNU Info files (which should be easy to implement) would leave us with the following binary files being reported: app-misc/linux-logo/files/gentoo.logo dev-util/deskzilla/files/deskzilla_gentoo.license sci-biology/readseq/files/19930201-impl-dec.patch sys-apps/sed/files/unix2dos Not bad, it doesn't include the CJK false positives any more.
just do it.
(In reply to Jeremy Olexa (darkside) from comment #10) > just do it. Committed to qa-scripts repo: http://git.overlays.gentoo.org/gitweb/?p=proj/qa-scripts.git;a=commit;h=9b2496f2a230e8f29feda04e141ccd574858cb24 @infra: Can you please update cron to run find-binary-files.sh instead of find-binary-files.pl?
All set up. Closing.