Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 136082 - glib G_FILENAME_ENCODING defaults to UTF-8
Summary: glib G_FILENAME_ENCODING defaults to UTF-8
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High major (vote)
Assignee: Gentoo Linux Gnome Desktop Team
URL:
Whiteboard:
Keywords:
: 132966 196932 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-06-08 10:08 UTC by Oldrich Jedlicka
Modified: 2008-02-14 17:53 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oldrich Jedlicka 2006-06-08 10:08:16 UTC
The update of glib caused that I do not see my filenames correctly. I do not use UTF-8, but the ebuild sets the filesystem encoding to UTF-8 in /etc/env.d/50glib2.

This breaks all my GTK applications. I think it would be better to set it to the locale encoding with 

  G_FILENAME_ENCODING=@locale
Comment 1 Michael Mauch 2006-07-20 13:32:11 UTC
I had the same problem until I read this bug report.

G_FILENAME_ENCODING=@locale

solved it for me, too.
Comment 2 Saleem Abdulrasool (RETIRED) gentoo-dev 2006-08-16 21:07:31 UTC
*** Bug 132966 has been marked as a duplicate of this bug. ***
Comment 3 Leonardo Ferreira Fontenelle 2006-11-17 17:18:53 UTC
Solved for me too, in October. Isn't it fixed yet?
Comment 4 Leonardo Ferreira Fontenelle 2006-11-17 17:19:18 UTC
(In reply to comment #3)
> Solved for me too, in October. Isn't it fixed yet?
I mean *August*.

Comment 5 Sven 2007-06-13 21:27:04 UTC
Hi! This was driving me mad too!

Just today, i discovered why my gnome-apps show the filenames incorrectly and i was SHOCKED that a gentoo developer decided to set G_FILENAME_ENCODING to UTF-8.

It simply spoiles every non-unicode system out there, and only those who care report it.
Comment 6 Sven 2007-06-13 22:14:30 UTC
I took a look at the ebuilds.

So here's what i found:
        # Consider invalid UTF-8 filenames as locale-specific.
        # TODO :: Eventually get rid of G_BROKEN_FILENAMES
        dodir /etc/env.d
        echo "G_BROKEN_FILENAMES=1" > ${D}/etc/env.d/50glib2
        echo "G_FILENAME_ENCODING=UTF-8" >> ${D}/etc/env.d/50glib2

What does "Consider invalid UTF-8 filenames as locale-specific." mean ?
Comment 7 Mart Raudsepp gentoo-dev 2007-06-14 01:07:11 UTC
(In reply to comment #5)
> Just today, i discovered why my gnome-apps show the filenames incorrectly and i
> was SHOCKED that a gentoo developer decided to set G_FILENAME_ENCODING to
> UTF-8.

Ok, and that's exactly what the default is as well in glib.
Setting G_FILENAME_ENCODING=UTF-8 is equal to not setting a G_FILENAME_ENCODING at all. You should be shocked at UTF-8 being the default upstream instead. Gentoo has it specifically set to UTF-8 to ensure people convert over to UTF-8 before we eventually remove the setting of it as a whole.

> It simply spoiles every non-unicode system out there, and only those who care
> report it.

So we should instead set it to KOI-8 for russians (this is an example - I don't know others from the top of my head) and spoil the rest of the world or what do you suggest?
Or should we set it to @locale to keep every single freshly installed system out there to still keep on saving files in an encoding that is not exchangeable to the rest of the world without leading to display issues?

>         # Consider invalid UTF-8 filenames as locale-specific.
>         echo "G_BROKEN_FILENAMES=1" > ${D}/etc/env.d/50glib2

> What does "Consider invalid UTF-8 filenames as locale-specific." mean ?

See what G_BROKEN_FILENAMES env var does in glib from http://developer.gnome.org/doc/API/2.0/glib/glib-running.html
It might be possible that by setting it, and setting G_FILENAME_ENCODING as well, UTF-8 trumps, but if the filename doesn't validate as UTF-8, it will be considered to be in the locales charset encoding but I have not tested this, but the comment seems to suggest that.
Comment 8 Mart Raudsepp gentoo-dev 2007-06-14 01:21:31 UTC
Others in Gnome herd - any comments on this?
My official standing would be to INVALID or WONTFIX this bug - UTF-8 is there on purpose.
Comment 9 Leonardo Ferreira Fontenelle 2007-06-14 01:47:32 UTC
(In reply to comment #7)
> Gentoo has it specifically set to UTF-8 to ensure people convert over to UTF-8
> before we eventually remove the setting of it as a whole.
> 
> Or should we set it to @locale to keep every single freshly installed system
> out there to still keep on saving files in an encoding that is not exchangeable
> to the rest of the world without leading to display issues?

I agree UTF-8 is better and I already use it.

But, when I installed Gentoo and Gnome, I simply had no idea on why were names displayed incorrectly. UTF-8/Unicode isn't (or at least wasn't) default, there's even some documentation on how to _convert_ the system to UTF-8.

I'm sorry, but IMHO breaking people's system isn't a very nice policy. At the very least, people should be warned in many places with large letters that they have to edit the file to have their filenames working properly, if they decided not to use UTF-8.
Comment 10 Sven 2007-06-14 02:03:38 UTC
(In reply to comment #7)
> (In reply to comment #5)
> > Just today, i discovered why my gnome-apps show the filenames incorrectly and i
> > was SHOCKED that a gentoo developer decided to set G_FILENAME_ENCODING to
> > UTF-8.
> 
> Ok, and that's exactly what the default is as well in glib.
> Setting G_FILENAME_ENCODING=UTF-8 is equal to not setting a G_FILENAME_ENCODING
> at all. You should be shocked at UTF-8 being the default upstream instead.
> Gentoo has it specifically set to UTF-8 to ensure people convert over to UTF-8
> before we eventually remove the setting of it as a whole.

Hmm. So the behaviour of my glib 2.12.11 is like G_FILENAME_ENCODING=@locale, when G_FILENAME_ENCODING is unset.
So i guess, the switch to utf8 upstream was done for newer versions?

> > It simply spoiles every non-unicode system out there, and only those who care
> > report it.
> 
> So we should instead set it to KOI-8 for russians (this is an example - I don't
> know others from the top of my head) and spoil the rest of the world or what do
> you suggest?
> Or should we set it to @locale to keep every single freshly installed system
> out there to still keep on saving files in an encoding that is not exchangeable
> to the rest of the world without leading to display issues?

Actually "no" to russion and "yes" to @locale.

Do all application do that radical move to UTF-8 these days? I can't believe that the major non-glib application is doing that.

Actually, the "interoperation" between glib and non-glib application will be harmed, when using G_FILENAME_ENCODING=UTF-8.

So using the charset from the locale is alright. If gentoo users are ready, they will move to an UTF-8 locale. Don't force us to use UTF-8! Even SuSE doesn't (yet).

If you worry about, that gentoo users forget setting the locale:
well, include some file /etc/env.d/02locate with some content - an utf8 locale maybe. Is there an LC_* variable for setting the charset?

Anyway: the keyboard, the console, etc. - everything is NOT SET to utf8 by default, i believe -- and the users set the locale they want - if they want any.

> >         # Consider invalid UTF-8 filenames as locale-specific.
> >         echo "G_BROKEN_FILENAMES=1" > ${D}/etc/env.d/50glib2
> 
> > What does "Consider invalid UTF-8 filenames as locale-specific." mean ?
> 
> See what G_BROKEN_FILENAMES env var does in glib from
> http://developer.gnome.org/doc/API/2.0/glib/glib-running.html
> It might be possible that by setting it, and setting G_FILENAME_ENCODING as
> well, UTF-8 trumps, but if the filename doesn't validate as UTF-8, it will be
> considered to be in the locales charset encoding but I have not tested this,
> but the comment seems to suggest that.

For me, G_BROKEN_FILENAMES does not work. My filenames are in ISO-8859-1 and all my äöü show up as garbage on my gnome desktop even though G_BROKEN_FILENAMES=1 and G_FILENAME_ENCODING=utf-8.

Unsetting G_FILENAME_ENCODING or setting it to @locale works.
Comment 11 Mart Raudsepp gentoo-dev 2007-06-14 02:39:17 UTC
(In reply to comment #10)

> Hmm. So the behaviour of my glib 2.12.11 is like G_FILENAME_ENCODING=@locale,
> when G_FILENAME_ENCODING is unset.
> So i guess, the switch to utf8 upstream was done for newer versions?

No. This has always been defaulting to utf-8. The thing is that G_BROKEN_FILENAMES is set by gentoo as well right now and that is essentially the same as @locale for ENCODING:

"G_BROKEN_FILENAMES.   If this environment variable is set, GLib assumes that filenames are in the locale encoding rather than in UTF-8. G_FILENAME_ENCODING takes priority over G_BROKEN_FILENAMES."

> > So we should instead set it to KOI-8 for russians (this is an example - I don't
> > know others from the top of my head) and spoil the rest of the world or what do
> > you suggest?
> > Or should we set it to @locale to keep every single freshly installed system
> > out there to still keep on saving files in an encoding that is not exchangeable
> > to the rest of the world without leading to display issues?
> 
> Actually "no" to russion and "yes" to @locale.
> 
> Do all application do that radical move to UTF-8 these days? I can't believe
> that the major non-glib application is doing that.

Not sure what application you are referring to.
There's some explanations here:
http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#id2692641

Based on the explanations there, maybe we should just set G_FILENAME_ENCODING to "UTF-8,@locale" to satisfy everyone? Then again, G_FILENAME_ENCODING at "running glib applications" chapter in the manual says it assumes the first encoding in the comma separate list - makes me wonder what affect do any subsequent encodings have then

> Actually, the "interoperation" between glib and non-glib application will be
> harmed, when using G_FILENAME_ENCODING=UTF-8.
> 
> So using the charset from the locale is alright. If gentoo users are ready,
> they will move to an UTF-8 locale. Don't force us to use UTF-8! Even SuSE
> doesn't (yet).

I think it is a good idea to default to UTF-8, so that freshly installed setups don't get their files created in an encoding that requires information of what encoding it is - with UTF-8 the files are exchangeable to all UTF-8 systems, be they from russia, japan or china. If you have your files in a different encoding then you will see it and fix it up. If you don't, well, then you are fighting with the mills and have to edit the file in /etc/env.d to change it to the encoding you use or the locale you have.

> If you worry about, that gentoo users forget setting the locale:
> well, include some file /etc/env.d/02locate with some content - an utf8 locale
> maybe. Is there an LC_* variable for setting the charset?

I don't think so. But see the output of "locale -k charmap". I don't know where it gets the mapping.

> Anyway: the keyboard, the console, etc. - everything is NOT SET to utf8 by
> default, i believe -- and the users set the locale they want - if they want
> any.

The sane idea is to have filenames NOT depend on locale. I can't find a long explanation why this is a good idea, but some links:
http://mail.gnome.org/archives/nautilus-list/2004-July/msg00163.html
http://developer.gnome.org/doc/API/2.2/glib/glib-Character-Set-Conversion.html#id2690287

> For me, G_BROKEN_FILENAMES does not work. My filenames are in ISO-8859-1 and
> all my äöü show up as garbage on my gnome desktop even though
> G_BROKEN_FILENAMES=1 and G_FILENAME_ENCODING=utf-8.

As I said G_FILENAME_ENCODING trumps G_BROKEN_FILENAMES if set. It doesn't matter much what G_BROKEN_FILENAMES is set to if you don't unset G_FILENAME_ENCODING.

> Unsetting G_FILENAME_ENCODING or setting it to @locale works.

If you unset it, the G_BROKEN_FILENAMES setting kicks in. If that were unset as well, then the default is UTF-8.

We should probably start not setting G_BROKEN_FILENAMES as it's deprecated in favour of @locale in G_FILENAME_ENCODING
Comment 12 Leonardo Ferreira Fontenelle 2007-06-14 03:30:23 UTC
(In reply to comment #11)

The problem in this bug is that the user can't exchange files with himself. I really had to use wrapper scripts to open text files correctly, because some filenames where encoded in UTF-8 and some in ISO-889-1, depending on which application created/edited it.

If you could make all software, from the kernel to every desktop environment, use this glib setting, then there would be no problem.

> If you have your files in a different encoding 
> then you will see it and fix it up.

It took me months until I found this bug report (by accident, IIRC) and discovered how to fix it. I had no clue about what to do, all I knew is that it was GNOME-related. I'm sorry if it's obvious for you, but it's not for me, and it won't be obvious to many, many users.
Comment 13 Sven 2007-06-14 03:45:26 UTC
So the current setting is REALLY broken.
As explained in the previous comment, G_FILENAME_ENCODING="UTF-8,@locale" would work MUCH better than the current setting.

Actually, i would work me. I mean: i see (nearly) all old iso-8859-1 filenames correctly. New files are created with utf-8 though. 

(Well, normal users don't understand the glib issue, and they will not look at their /etc/env.d/50glib2 to figure out, where the utf8 encoded filenames in their terminal come from - so normal users will only get happy with a utf8-locale - just to say it once again!)
Comment 14 Oldrich Jedlicka 2007-06-14 18:10:19 UTC
I'm impressed, somebody found this bug :-)

But the information presented here is not fully correct. Glib version 2.12.11 uses those rules (Unixes) - see glib/gconvert.c:

* If G_FILENAME_ENCODING is set, the comma separated list from it is taken as character set. The @locale string is replaced by the current locale
* If G_FILENAME_ENCODING is not set, but G_BROKEN_FILENAMES is set, it is the same as G_FILENAME_ENCODING=@locale
* If neither is set, it is the same as G_FILENAME_ENCODING=UTF-8,@locale (if @locale is not UTF-8)

So there is only one result from Gentoo ebuilds: broken filenames for _everybody_ who doesn't use UTF-8. Please fix this by removing the env.d settings or place @locale into the settings as it is in the _default_ settings.

Logical (from the user point of view) would be to set G_FILENAME_ENCODING=@locale, because other applications "use" locale settings to encode file names (actually when no encoding is done). UTF-8 is good, but is unreadable for non-UTF-8 (but locale aware) applications. I think the fact UTF-8 is used by default prior to @locale to code names is at least important enough to write it in the ebuild too.

So it would be really good, if this moves into ebuild:

1. Remove settings from env.d
2. Note in ebuild that new filenames will be created with UTF-8, but GTK applications will be able to read @locale encoded file names too. If the user wants the old behavior, he should place G_FILENAME_ENCODING=@locale into env.d.
Comment 15 Rémi Cardona (RETIRED) gentoo-dev 2007-06-14 19:16:19 UTC
I vote for WONTFIX.

Gentoo has had a Unicode Guide for years. Everything you need to know is in there. Note that you can very well have a utf8 filesystem (ie filenames) but keep whatever locale/encoding you want for file contents (though going full unicode is easier)

>>>  http://www.gentoo.org/doc/en/utf-8.xml  <<<

RedHat/Fedora made the switch years ago, Ubuntu was full UTF8 after breezy, even windows since NT4 has full unicode NTFS support (with crappy codepage stuff on top ...)
Comment 16 Gilles Dartiguelongue (RETIRED) gentoo-dev 2007-06-14 19:26:30 UTC
FTR, debian is full utf-8 as well and I think that gentoo defaults to UTF-8 since about 2 years (although I didn't installed a gentoo recently to check that)
Comment 17 Mart Raudsepp gentoo-dev 2007-06-14 19:28:02 UTC
Fedora still sets G_BROKEN_FILENAMES=1. I believe the premise there is that they set the locale to something like en_GB.UTF8 then, and that will make it STILL be UTF-8, as if it wouldn't be even there. That would need some verification though.
Comment 18 Daniel Gryniewicz (RETIRED) gentoo-dev 2007-06-14 20:08:02 UTC
I vote nuke them both and leave the default UTF8,@locale

The big reason being: I see no justification for it being added in the first place during 2.9.x.

Unfortunately, just doing that won't help anyone, because the old 50glib2 won't get removed, because of config protection.  So maybe explicitly setting it to UTF8,@locale would be the correct solution.
Comment 19 Sven 2007-06-14 22:41:15 UTC
(In reply to comment #15)
> RedHat/Fedora made the switch years ago, Ubuntu was full UTF8 after breezy,

I wonder, what they set G_FILENAME_ENCODING to.

I guess, they set it to G_FILENAME_ENCODING=@locale and simply guide users to use utf8-locales by default. I only have access to a OpenSuSE-Box. Their default is:
G_FILENAME_ENCODING=@locale,UTF-8,ISO-8859-1,CP1252

Comment 20 Jakub Moc (RETIRED) gentoo-dev 2007-07-08 18:31:55 UTC
(In reply to comment #18)
> Unfortunately, just doing that won't help anyone, because the old 50glib2 won't
> get removed, because of config protection.

/etc/env.d is in CONFIG_PROTECT_MASK by default, so it will be removed unless someone modified it.
Comment 21 Daniel Gryniewicz (RETIRED) gentoo-dev 2007-07-08 21:17:46 UTC
You're right, I just tested, and it works fine.  I've committed the change to the overlay, so it will definitely be in 2.20.  Gnomies: what do people think about doing it for 2.18?  What about 2.16?
Comment 22 Rémi Cardona (RETIRED) gentoo-dev 2007-07-08 21:23:16 UTC
If I understand this correctly, removing env.d/50glib will set G_FILENAME_ENCODING to "UTF-8,@locale".

If that's the case, I'm ok with it :)
Comment 23 Rémi Cardona (RETIRED) gentoo-dev 2007-07-08 21:24:32 UTC
As for the current portage ebuilds, +1 for 2.18. I don't think it's worth bothering with 2.16.
Comment 24 Leonardo Ferreira Fontenelle 2007-07-09 01:19:25 UTC
I'm just a user, but if my opinion counts, let's get this as soon as possible, provided that it should work, based on the previous discussion on this bug report.
Comment 25 Jakub Moc (RETIRED) gentoo-dev 2007-10-24 17:45:19 UTC
*** Bug 196932 has been marked as a duplicate of this bug. ***
Comment 26 Sebastian 2007-10-24 18:37:37 UTC
Hi all,

is this issue really worth such a long thread? I'd say unset G_BROKEN_FILENAMES and set G_FILENAME_ENCODING="@locale". Don't try to impose UTF-8 on the users. UTF-8 is nice but there are still issues with it (that is to say some apps still have issues with UTF-8).
Plus this should be resolved fast. It takes the (non-UTF8-) users too damn long to figure out why nautilus and the like don't handle filenames correctly.

Regards
Sebastian
Comment 27 Oldrich Jedlicka 2008-02-14 17:53:36 UTC
The file /etc/env.d/50glib2 is not generated in 2.14.* (stable) anymore so I think we can close this issue, because the ebuild is fixed. Thanks.