Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 777399 - app-editors/vim: CJK utf-8 locales are not recognized as utf-8
Summary: app-editors/vim: CJK utf-8 locales are not recognized as utf-8
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Oskari Pirhonen
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-20 16:10 UTC by cangming.liu
Modified: 2023-02-10 04:28 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description cangming.liu 2021-03-20 16:10:19 UTC
Steps to reproduce:

1. Set locale to zh_CN.UTF-8
2. Open vim and edit a file with only characters in ascii range

Expected behavior:
file encoding set to utf-8 (or empty, with utf-8 being the default)

Actual behavior:
file encoding set to euc-cn


My vimscript is rusty, but I believe the issue is caused by the if-block here:

https://github.com/gentoo/gentoo/blob/master/app-editors/vim-core/files/vimrc-r5#L54

In my case, locale is zh_CN.UTF-8, which causes fileencodings option to be overridden with gb2312. After the if statements a few lines down, the final file encodings list ends up being:

ucs-bom,gb2312,utf-8,default

Personally I haven't run into gb2312 for a very long time, I think for a utf-8 locale, it's safe to drop. Even if we think it's worth keeping, utf-8 should come first in the list.

Installed package versions:
app-editors/vim-8.2.0814-r100
app-editors/vim-core-8.2.0814
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-03-20 20:12:02 UTC
Any chance you could give a rough patch?
Comment 2 cangming.liu 2021-03-21 14:02:40 UTC
Depends on how we want to proceed, I see a few options:

1. Explicitly check for UTF-8 locales and skip setting file encodings
2. Keep the gb2312 encoding, but have UTF-8 come first in the list
3. Remove all code that touches file encodings and use vim default, which is quite reasonable for 2021. Users who really care can set file encodings in their own vimrc.

3 is the most risky, but IMHO it's the cleanest, and the least surprising from a new Gentoo user perspective. I'm a recent migrant to Gentoo, and it certainly took me a while to figure out why/how my encodings are getting changed.

I'll leave it up to the maintainers to decide which option to choose, as I don't presume to know the full intent behind these lines of code. Happy to provide a patch.

In any case, the fix will end up breaking the existing behavior for some users, how does the Gentoo community handle such changes?