Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 680244 - sys-apps/coreutils: printf chokes on \u0041
Summary: sys-apps/coreutils: printf chokes on \u0041
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: UPSTREAM
Depends on:
Blocks:
 
Reported: 2019-03-13 14:05 UTC by Ulrich Müller
Modified: 2023-06-07 14:21 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Müller gentoo-dev 2019-03-13 14:05:08 UTC
According to printf(1):

   Interpreted sequences are:
   [...]
   
   \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

   \UHHHHHHHH
          Unicode character with hex value HHHHHHHH (8 digits)

It does not work, though:

$ /usr/bin/printf '\u0041\n'
/usr/bin/printf: invalid universal character name \u0041
$ /usr/bin/printf '\U00000041\n'
/usr/bin/printf: invalid universal character name \U00000041

Other tools interpret the sequence correctly:

$ printf '\u0041\n'   # bash
A
$ echo -e '\u0041'    # bash
A
$ zsh -c "echo -e '\u0041'"
A
$ emacs -Q --batch --eval '(princ "\u0041\n")'
A
$ python -c "print ('\u0041')"
A
$ ruby -e 'print("\u0041\n")'
A
Comment 1 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev 2019-03-13 14:26:15 UTC
Also happens with printf from coreutils-8.31...
Comment 2 Ulrich Müller gentoo-dev 2019-08-01 11:05:03 UTC
Reported upstream as requested by polynomial-c:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36887
Comment 3 Ulrich Müller gentoo-dev 2023-06-07 14:21:36 UTC
This has been fixed in coreutils-9.2:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=0925e8a0f413ecf9004153d89b312b385b20d0ee

author	Pádraig Brady <P@draigBrady.com>	2022-10-27 15:17:07 +0100

printf: with \U, support all valid unicode points

Previously this was restricted to the C99 universal character subset,
which restricted most values <= 0x9F, as that simplifies the C lexer.
However printf(1) doesn't need this restriction.
Note also the bash builtin printf already supports all values <= 0x9F.

* src/printf.c (main): Relax the restriction on points <= 0x9F.
* doc/coreutils.texi (printf invocation): Adjust description.
* tests/misc/printf-cov.pl: Adjust accordingly.  Add new cases.
* NEWS: Mention the change in behavior.
Reported at https://bugs.debian.org/1022857