According to printf(1): Interpreted sequences are: [...] \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits) \UHHHHHHHH Unicode character with hex value HHHHHHHH (8 digits) It does not work, though: $ /usr/bin/printf '\u0041\n' /usr/bin/printf: invalid universal character name \u0041 $ /usr/bin/printf '\U00000041\n' /usr/bin/printf: invalid universal character name \U00000041 Other tools interpret the sequence correctly: $ printf '\u0041\n' # bash A $ echo -e '\u0041' # bash A $ zsh -c "echo -e '\u0041'" A $ emacs -Q --batch --eval '(princ "\u0041\n")' A $ python -c "print ('\u0041')" A $ ruby -e 'print("\u0041\n")' A
Also happens with printf from coreutils-8.31...
Reported upstream as requested by polynomial-c: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36887
This has been fixed in coreutils-9.2: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=0925e8a0f413ecf9004153d89b312b385b20d0ee author Pádraig Brady <P@draigBrady.com> 2022-10-27 15:17:07 +0100 printf: with \U, support all valid unicode points Previously this was restricted to the C99 universal character subset, which restricted most values <= 0x9F, as that simplifies the C lexer. However printf(1) doesn't need this restriction. Note also the bash builtin printf already supports all values <= 0x9F. * src/printf.c (main): Relax the restriction on points <= 0x9F. * doc/coreutils.texi (printf invocation): Adjust description. * tests/misc/printf-cov.pl: Adjust accordingly. Add new cases. * NEWS: Mention the change in behavior. Reported at https://bugs.debian.org/1022857