Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 275244 - app-arch/unzip-5.52-r2 lacks iconv support
Summary: app-arch/unzip-5.52-r2 lacks iconv support
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High enhancement with 1 vote (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-24 05:00 UTC by Arseny Solokha
Modified: 2010-12-23 17:57 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Proposed patch for the ebuild (unzip-5.52-r3.patch,925 bytes, patch)
2009-06-24 05:01 UTC, Arseny Solokha
Details | Diff
dev-libs/natspec-0.2.5.ebuild (natspec-0.2.5.ebuild,953 bytes, text/plain)
2009-10-30 08:02 UTC, Alexandre Rostovtsev (RETIRED)
Details
unzip-6.0-alt-natspec.patch (unzip-6.0-alt-natspec.patch,12.84 KB, patch)
2009-10-30 08:04 UTC, Alexandre Rostovtsev (RETIRED)
Details | Diff
new unzip-6.0-r1.ebuild (unzip-6.0-r1.ebuild,2.05 KB, text/plain)
2009-10-30 08:09 UTC, Alexandre Rostovtsev (RETIRED)
Details
dev-libs/natspec-0.2.5.ebuild (natspec-0.2.5.ebuild,961 bytes, text/plain)
2009-10-30 08:16 UTC, Alexandre Rostovtsev (RETIRED)
Details
unzip-6.0-alt-natspec.patch (unzip-6.0-alt-natspec.patch,13.68 KB, patch)
2010-12-20 09:13 UTC, Jiri Tyr
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arseny Solokha 2009-06-24 05:00:35 UTC
ZIP compression format doesn't provide any standardized encoding for names of compressed files by default. That is one can compress files in one locale and decompress them on machine where different encoding were set, and names of decompressed files will be broken.
A patch which adds support for recoding names of decompressing files is available for a very long time (http://sisyphus.ru/cgi-bin/srpm.pl/Sisyphus/unzip/getpatch/3). Unfortunately it works with cyrillic encodings only, but adding support for any desired encoding seems to be trivial.
There is also a slightly improved version of this patch which is available in crg overlay (http://gentoo-overlays.zugaina.org/crg/portage/app-arch/unzip/files//unzip-5.50-iconv-v1.2-utf8.patch). There's no reason to use outdated version of the patch of course.

Reproducible: Always

Steps to Reproduce:
Comment 1 Arseny Solokha 2009-06-24 05:01:54 UTC
Created attachment 195622 [details, diff]
Proposed patch for the ebuild
Comment 2 Peter Volkov (RETIRED) gentoo-dev 2009-07-16 10:23:11 UTC
The patch you reference here is not used by altlinux and afaik it has some issues, discussed in altlinux bugzilla. Correct way is to use NATSPEC library[*] but packaging it correctly requires quite some work on configure.in script (many automagic dependencies and thus inability to disable documentation/python bindings building, which are really redundant for such basic package). I've started to work on it but had no time to finish.

* http://freesource.info/wiki/Lokalizacija/BibliotekaNATSPEC

I've dropped really draft ebuild into my overlay (dev-libs/natspec).

Another problem: unzip-6.0 with some utf8 support is out and this and natspec patch required to be reworked (anyone?).
Comment 3 SpanKY gentoo-dev 2009-08-14 07:21:20 UTC
unzip-6.0 should have unicode support now
Comment 4 Peter Volkov (RETIRED) gentoo-dev 2009-08-14 07:54:24 UTC
Yes, unzip has unicode support, but it's still unable to decode files packed in windows (and this is a problem here, since I have to open such files). OTOH altlinux updated patch for unzip-6.0 so the only blocker here is to fix automagic dependencies in natspec and improve build system there.
Comment 5 SpanKY gentoo-dev 2009-08-16 18:15:10 UTC
let me phrase it this way ... i have no files that cause a problem for unzip, nor do i have an interest in fixing this, nor do i really understand the issues you reference.  so if you have a fix for unzip-6.0, feel free to update the ebuild in the tree.
Comment 6 Aleksandr Yakimov 2009-08-16 18:46:50 UTC
(In reply to comment #5)
> let me phrase it this way ... i have no files that cause a problem for unzip,
See http://www.fipi.ru/binaries/724/bio%20WinRAR.zip as a sample.
Comment 7 Alexandre Rostovtsev (RETIRED) gentoo-dev 2009-10-30 08:02:11 UTC
Created attachment 208700 [details]
dev-libs/natspec-0.2.5.ebuild

Ebuild for dev-libs/natspec-0.2.5 library (required for the ALT linux patch)
This version has a better configure script - python bindings are now optional.

Tested on ~amd64.
Comment 8 Alexandre Rostovtsev (RETIRED) gentoo-dev 2009-10-30 08:04:34 UTC
Created attachment 208701 [details, diff]
unzip-6.0-alt-natspec.patch

Patch for unzip-6.0 by ALT linux (available at http://sisyphus.ru/ru/srpm/Sisyphus/unzip/patches/0) to enable manually setting legacy filename encodings via the -O switch.
Comment 9 Alexandre Rostovtsev (RETIRED) gentoo-dev 2009-10-30 08:09:01 UTC
Created attachment 208702 [details]
new unzip-6.0-r1.ebuild

unzip-6.0 ebuild that uses above patch.

I have successfully used the patched unzip to extract files from zip files whose contents' filenames are in the cp866 encoding (as a test case, have a look at all the .zip files at the bottom of http://www.lawinstitut.ru/archnum.aspx?lang=ru)

Can we please get this in portage?
Comment 10 Alexandre Rostovtsev (RETIRED) gentoo-dev 2009-10-30 08:16:40 UTC
Created attachment 208704 [details]
dev-libs/natspec-0.2.5.ebuild

Better ebuild for natspec (popt and tcl are only used during the build process, not in the installed libraries, so move them to DEPEND)
Comment 11 Jiri Tyr 2010-01-23 13:48:50 UTC
It works well for me. I have tested it with cp852 encoding on x86 machine. I would like to see this feature in the Portage Tree.
Comment 12 SpanKY gentoo-dev 2010-01-24 21:50:03 UTC
libnatspec links against popt, so it needs it in RDEPEND

ive added that package to the tree, but even with the proposed patch, the sample zip in comment #6 still doesnt work for me

Archive:  bio WinRAR.zip
   creating: ????/
  inflating: ????/????_??????????_2009.pdf
  inflating: ????/????_????????_2009.pdf
  inflating: ????/????_????????????_2009.pdf
Comment 13 Jiri Tyr 2010-01-24 22:13:58 UTC
(In reply to comment #12)
> sample zip in comment #6 still doesnt work for me
> 
> Archive:  bio WinRAR.zip
>    creating: ????/
>   inflating: ????/????_??????????_2009.pdf
>   inflating: ????/????_????????_2009.pdf
>   inflating: ????/????_????????????_2009.pdf

I know, the list of files is wrong. But the files have right names when they are unpacked. Try this:

$ zipnote file.zip | iconv -f cp852 -t utf8

Change the cp852 encoding to the one what you need.
Comment 14 SpanKY gentoo-dev 2010-01-24 22:18:01 UTC
the file output is incorrect too:
-- üê
   |-- üê_æ»Ñµ¿Σ_2009.pdf
   |-- üê_äѼ«_2009.pdf
   `-- üê_è«ñ¿Σ_2009.pdf
but even if the files were correct, the output should have been correct.  i'm not inclined to add a patch that only fixes 20% of the problem.

i'm using a unicode locale here ...
Comment 15 Arseny Solokha 2010-01-24 22:36:52 UTC
(In reply to comment #14)
> the file output is incorrect too:
> -- üê
>    |-- üê_æ»Ñµ¿Σ_2009.pdf
>    |-- üê_äѼ«_2009.pdf
>    `-- üê_è«ñ¿Σ_2009.pdf

It is because you've applied incorrect encoding here. CP866 is Cyrillic/Russian encoding using in DOS.
$ zipnote file.zip | iconv -f cp866 -t utf8
produces correct file names (БИ/, БИ/БИ_Кодиф_2009.pdf, БИ/БИ_Демо_2009.pdf, БИ/БИ_Специф_2009.pdf respectively; hope Bugzilla and your browser displays them correctly). I'm using en_US.UTF-8.
Comment 16 Aleksandr Yakimov 2010-04-25 10:29:44 UTC
(In reply to comment #14)
> but even if the files were correct, the output should have been correct.  i'm
> not inclined to add a patch that only fixes 20% of the problem.
This solves the main problem - correct files after unpack.

ALT uses other patches (Ark on KDE-4) for GUI output.
Comment 17 Aleksandr Yakimov 2010-12-19 17:28:09 UTC
(In reply to comment #14)
> the file output is incorrect too:
> -- üê
>    |-- üê_æ»Ñµ¿Σ_2009.pdf
>    |-- üê_äѼ«_2009.pdf
>    `-- üê_è«ñ¿Σ_2009.pdf
> but even if the files were correct, the output should have been correct.  i'm
> not inclined to add a patch that only fixes 20% of the problem.
> 
> i'm using a unicode locale here ...
> 
Improved patch http://www.opennet.ru/soft/zip_rus/unzip60-natspec-mod.diff.gz (detailed article in Russian http://www.opennet.ru/tips/info/2494.shtml ). Clean output and correct filenames in KDE4 Ark.
Comment 18 Jiri Tyr 2010-12-20 09:13:06 UTC
Created attachment 257604 [details, diff]
unzip-6.0-alt-natspec.patch

Improved patch (source: http://www.opennet.ru/soft/zip_rus/unzip60-natspec-mod.diff.gz).
Comment 19 Peter Volkov (RETIRED) gentoo-dev 2010-12-22 16:53:30 UTC
Patches for zip and unzip were applied in the tree. 

Mike, zipnote was not covered by patch from altlinux. The goal of the patch is to make zip encode non-ascii filenames in such way, so they could be read in Windows. That said, I've up added support of natspec for zipnote in our patchset too :)