69945 – unzip extract broken filename.

Bug 69945 - unzip extract broken filename.

Summary: unzip extract broken filename.

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Duplicates (1):	204257 (view as bug list)
Depends on:
Blocks:

Reported:	2004-11-03 08:01 UTC by Young-Ho Cha
Modified:	2008-01-04 10:12 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
modified for x86_64 (unzip-5.52-locale.patch,6.69 KB, patch) 2007-08-11 07:11 UTC, Young-deuk Hong	Details \| Diff
add epatch for local patch (unzip-5.52-r1.ebuild,1.68 KB, text/plain) 2007-08-11 07:12 UTC, Young-deuk Hong	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Young-Ho Cha 2004-11-03 08:01:40 UTC

when unzip extract filename , unzip handle with 7 bit filename.
so filenames with non-latin1 characters are broken.

and an zip archive from M$ Windows or other OSes with non utf-8 locale, extracts broken filenames too.

so, i make a little patch that detect file charset from locale.

after apply this patch, file-roller(gnome archiving tool) can show korean filenames. 

i think it works in other locale too.


Reproducible: Always
Steps to Reproduce:
1. extract with unzip from zip archive

Actual Results:  
ganadist@ganadist /tmp $ unzip -x /home/ganadist/Documents/hhs_img.zip
Archive:  /home/ganadist/Documents/hhs_img.zip
  inflating: 23++&#65533;-+&#65533;-&#65533;_&#65533;+&#65533;&#65533;-&#65533;&#65533;&#65533;&#65533;-

Comment 1 Young-Ho Cha 2004-11-03 08:01:40 UTC

when unzip extract filename , unzip handle with 7 bit filename.
so filenames with non-latin1 characters are broken.

and an zip archive from M$ Windows or other OSes with non utf-8 locale, extracts broken filenames too.

so, i make a little patch that detect file charset from locale.

after apply this patch, file-roller(gnome archiving tool) can show korean filenames. 

i think it works in other locale too.


Reproducible: Always
Steps to Reproduce:
1. extract with unzip from zip archive

Actual Results:  
ganadist@ganadist /tmp $ unzip -x /home/ganadist/Documents/hhs_img.zip
Archive:  /home/ganadist/Documents/hhs_img.zip
  inflating: 23++&#65533;-+&#65533;-&#65533;_&#65533;+&#65533;&#65533;-&#65533;&#65533;&#65533;&#65533;-ø&#65533;&#65533;+&#65533;-&#65533;-&#65533;.jpg
  inflating: pop_++&#65533;-+&#65533;-&#65533;-&#65533;.png
  inflating: pop_-&#65533;&#65533;&#65533;-&#65533;+-.png


Expected Results:  
ganadist@ganadist /tmp $ unzip -x /home/ganadist/Documents/hhs_img.zip
Archive:  /home/ganadist/Documents/hhs_img.zip
  inflating: 23&#52404;&#47141;&#52769;&#51221;_&#49900;&#54224;&#51648;&#44396;&#47141;&#54792;&#50517;&#52769;&#51221;&#51473;.jpg
  inflating: pop_&#52404;&#47141;&#52769;&#51221;&#51473;.png
  inflating: pop_&#52852;&#46300;&#51217;&#52489;.png

screenshot:
http://ftp.mizi.com/~ganadist/file-roller-broken.png
left picture is run with patched unzip, and right picture is unpatched unzip.
patch:
http://ftp.mizi.com/~ganadist/unzip-locale.diff

Comment 2 Young-Ho Cha 2004-11-03 08:06:05 UTC

ah.. "Results" reports are broken :(

i take screenshot.

before patch:
http://ftp.mizi.com/~ganadist/unzip-unpatched.png

after patch:
http://ftp.mizi.com/~ganadist/unzip-patched.png

Comment 3 Gregorio Guidi (RETIRED) gentoo-dev

2004-11-03 08:20:16 UTC

nice, have you submitted it to unzip authors?

ftp://ftp.info-zip.org/pub/infozip/FAQ.html#zip-bugs
http://www.info-zip.org/zip-bug.html

Comment 4 Young-Ho Cha 2004-11-04 22:26:00 UTC

I reported zip-bug form, and recieved answer.

----
Thank you!  We currently don't have a full-time UnZip maintainer, but
I have saved your patch and screenshots in my 6.0-patch-collection
directory, so at least they won't be lost.  (No clue when 6.0 might
be released, but probably not before the middle of next year.)

Comment 5 Alexander Simonov 2004-11-29 13:07:43 UTC

+if(!strncmp(lang, "ru", 2)) return "KOI8-R";
+if(!strncmp(lang, "uk", 2)) return "KOI8-U";
This strings is broken.
If russian locale is ru_RU.UTF8?
I seeing you patch and correct for cyrilic unicode locale

Comment 6 Alexander Simonov 2004-11-29 13:25:25 UTC

Oh!!! 
Sorry!!!
But if russian codepage is cp1251 ?

Comment 7 Heinrich Wendel (RETIRED) gentoo-dev

2005-01-11 06:36:03 UTC

any update on the patch so i can include it?

Comment 8 Young-Ho Cha 2005-01-13 06:43:49 UTC

updated russian's charset to CP1251 from KOI8-R.

can get from same url :)

Comment 9 SpanKY gentoo-dev

2005-08-15 21:12:57 UTC

could you please update it for 5.52 ?

Comment 10 Young-Ho Cha 2006-10-07 22:18:26 UTC

(In reply to comment #8)
> could you please update it for 5.52 ?
> 

updated patch for 5.52-r1 ebuild at same url.

Comment 11 SpanKY gentoo-dev

2006-11-10 23:59:50 UTC

i dont think that's quite how you want to do it ... i'm pretty sure you want to change the Ext_ASCII_TO_Native() macro instead of that "#if 0" stuff

also, unless i read the patch wrong, basing the zipfile input on $LANG doesnt make any sense ...

Comment 12 David Chang 2007-06-13 17:30:54 UTC

here's a working patch

https://bugzilla.altlinux.org/attachment.cgi?id=1402

Comment 13 Young-deuk Hong 2007-08-11 07:10:36 UTC

if machine is x86_64, this patch will not work.

in x86_64(CHOST="x86_64-pc-linux-gnu"), ebuild sets TARGET to linux_noasm.

i have posted updated patch.

Comment 14 Young-deuk Hong 2007-08-11 07:11:56 UTC

Created attachment 127717 [details, diff]
modified for x86_64

Comment 15 Young-deuk Hong 2007-08-11 07:12:55 UTC

Created attachment 127718 [details]
add epatch for local patch

Comment 16 Jakub Moc (RETIRED) gentoo-dev

2008-01-04 10:12:23 UTC

*** Bug 204257 has been marked as a duplicate of this bug. ***