Bug 931701

Summary:	dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf
Product:	Gentoo Linux	Reporter:	Sam James <sam>
Component:	Current packages	Assignee:	PgSQL Bugs <pgsql-bugs>
Status:	CONFIRMED ---
Severity:	normal	CC:	kfm
Priority:	Normal
Version:	unspecified
Hardware:	All
OS:	Linux
See Also:	https://bugs.gentoo.org/show_bug.cgi?id=518522 https://bugs.gentoo.org/show_bug.cgi?id=792537
Whiteboard:
Package list:		Runtime testing required:	---

Description Sam James archtester

2024-05-11 03:01:13 UTC

A workaround for Perl was originally added in bug 518522. It broke in bug 792537 and we've since cleaned it up, but that only stops it being added for *new* installs.

Please figure out a way to clean it up from the user's postgresql.conf or warn on the bad line being there given the ebuild was responsible for adding it.

Comment 1 kfm 2024-05-11 05:56:46 UTC

Though I'm not familiar with PL/Perl, I can say that "use utf8" is only useful in cases where the source code is to be treated as UTF-8. That is, analogous to Encode::decode('UTF-8', $source_code).

$ perl -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
F0.9F.98.80 (length = 4)

$ perl -Mutf8 -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
1F600 (length = 1)

The second case shows the string containing a single codepoint (whatever encoding Perl uses to store it internally is immaterial).

---

That aside, the PostgreSQL docs make the following claim.

"Arguments will be converted from the database's encoding to UTF-8 for use inside PL/Perl, and then converted from UTF-8 back to the database encoding upon return."

This is ambiguous. Do they mean that strings are converted to UTF-8 bytes (as in the first example above), or that they are properly stored as 'wide' characters (as in the second example above)? I don't know but it should be wholly unrelated to the use of the utf8 pragma.

Suffice to say that letting the user deal with all of this is almost certainly a wise decision.

Comment 2 Sam James archtester

2024-09-04 04:20:38 UTC

ping. A user got hit by this in https://bugs.gentoo.org/518522#c10.