680244 – sys-apps/coreutils: printf chokes on \u0041

Bug 680244 - sys-apps/coreutils: printf chokes on \u0041

Summary: sys-apps/coreutils: printf chokes on \u0041

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:	UPSTREAM

Depends on:
Blocks:

Reported:	2019-03-13 14:05 UTC by Ulrich Müller
Modified:	2023-06-07 14:21 UTC (History)
CC List:	0 users

See Also:	https://debbugs.gnu.org/36887 https://bugs.debian.org/1022857
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ulrich Müller gentoo-dev

2019-03-13 14:05:08 UTC

According to printf(1):

   Interpreted sequences are:
   [...]
   
   \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

   \UHHHHHHHH
          Unicode character with hex value HHHHHHHH (8 digits)

It does not work, though:

$ /usr/bin/printf '\u0041\n'
/usr/bin/printf: invalid universal character name \u0041
$ /usr/bin/printf '\U00000041\n'
/usr/bin/printf: invalid universal character name \U00000041

Other tools interpret the sequence correctly:

$ printf '\u0041\n'   # bash
A
$ echo -e '\u0041'    # bash
A
$ zsh -c "echo -e '\u0041'"
A
$ emacs -Q --batch --eval '(princ "\u0041\n")'
A
$ python -c "print ('\u0041')"
A
$ ruby -e 'print("\u0041\n")'
A

Comment 1 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev

2019-03-13 14:26:15 UTC

Also happens with printf from coreutils-8.31...

Comment 2 Ulrich Müller gentoo-dev

2019-08-01 11:05:03 UTC

Reported upstream as requested by polynomial-c:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36887

Comment 3 Ulrich Müller gentoo-dev

2023-06-07 14:21:36 UTC

This has been fixed in coreutils-9.2:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=0925e8a0f413ecf9004153d89b312b385b20d0ee

author	Pádraig Brady <P@draigBrady.com>	2022-10-27 15:17:07 +0100

printf: with \U, support all valid unicode points

Previously this was restricted to the C99 universal character subset,
which restricted most values <= 0x9F, as that simplifies the C lexer.
However printf(1) doesn't need this restriction.
Note also the bash builtin printf already supports all values <= 0x9F.

* src/printf.c (main): Relax the restriction on points <= 0x9F.
* doc/coreutils.texi (printf invocation): Adjust description.
* tests/misc/printf-cov.pl: Adjust accordingly.  Add new cases.
* NEWS: Mention the change in behavior.
Reported at https://bugs.debian.org/1022857