Summary: | limits.h - fail to create file name more than 134 chars in non-English locale (incomplete UTF-8 support?) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Sergey S. Starikoff <Ikonta> |
Component: | Current packages | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | slyfox |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Sergey S. Starikoff
2019-07-25 12:25:50 UTC
(In reply to Sergey S. Starikoff from comment #0) > $ echo > "tttttттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттт > тттттттттттттттттттттттттттттттттттттттттттттттттттт.text" | wc -m > 134 char = 1 byte in C, so you should use `wc -c` in the command above. Cyrillic letters occupy 2 bytes in UTF-8 encoding. (In reply to Alexander Tsoy from comment #1) > char = 1 byte in C, so you should use `wc -c` in the command above. Cyrillic > letters occupy 2 bytes in UTF-8 encoding. I remember, that cyrillic letter in UTF8 locale is encode with two bytes. But quoted comment in limits.h promises not bytes, but chars. That not only confuses, but provides some troubles, for example when I need to get long cyrillic named files from an archive. On linux most filesystems except reiserfs have an internal limitation of 255 bytes per filename: https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits $ LANG=C strace -f touch tttttттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттттт.text openat(AT_FDCWD, "ttttt\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202\321\202.text", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENAMETOOLONG (File name too long) btrfs example: https://elixir.bootlin.com/linux/latest/source/fs/btrfs/inode.c#L5788 /* * we can actually store much bigger names, but lets not confuse the rest * of linux */ #define BTRFS_NAME_LEN 255 ... struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry) { .... if (dentry->d_name.len > BTRFS_NAME_LEN) return ERR_PTR(-ENAMETOOLONG); It's an arbitrary linux kernel limitation. It could be anything else. (In reply to Sergei Trofimovich from comment #3) > It's an arbitrary linux kernel limitation. It could be anything else. The real issue of this bug was the archive with unextractable files and a question about proper solution of this issue. (In reply to Sergey S. Starikoff from comment #4) > (In reply to Sergei Trofimovich from comment #3) > > It's an arbitrary linux kernel limitation. It could be anything else. > > The real issue of this bug was the archive with unextractable files and a > question about proper solution of this issue. Depends on your desired end state. 1. Possible solution 1: Explicit linux kernel support If you end goal is to get filesystem with UTF-8 names more than 255 bytes you wouldn't get it without patching linux kernel and breaking some APIs that assume maximum file path. I suggest asking linux-fsdevel@vger.kernel.org if it's feasible to add an extended mode to allow overgrown filename lengths and pay the price of some interfaces being broken against such files (stat()?). It will likely have a rippling effect on libcs and beyond. Might not be an easy thing to do alone. But if enough people are onboard with the idea then why not. The small precedent in a nearby area is a select() system call that does not have a kernel limitation on bit field size but most of userspace does not easily expose the functionality (FD_SET/FD_CLEAR). 2. Possible solution 2: use single-byte locale to get past unpacking Something like: $ LANG=ru_RU.KOI8-R unzip foo $ LANG=ru_RU.KOI8-R luit ls $ LANG=ru_RU.KOI8-R luit mv foo bar 3. Possible solution 3: use a wrapper/tool to extract and rename individual files with a rename (or mangle filenames in the archive) An example wrapper is app-misc/mc that allows you to browse zip archives and copy out individual files with a user-specified target file name. Might be good-enough to deal with an individual archive. |