931701 – dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf

Bug 931701 - dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf

Summary: dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf

Status:	CONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	PgSQL Bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2024-05-11 03:01 UTC by Sam James
Modified:	2024-05-11 05:56 UTC (History)
CC List:	1 user (show)

See Also:	518522 792537
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sam James archtester

2024-05-11 03:01:13 UTC

A workaround for Perl was originally added in bug 518522. It broke in bug 792537 and we've since cleaned it up, but that only stops it being added for *new* installs.

Please figure out a way to clean it up from the user's postgresql.conf or warn on the bad line being there given the ebuild was responsible for adding it.

Comment 1 RumpletonBongworth 2024-05-11 05:56:46 UTC

Though I'm not familiar with PL/Perl, I can say that "use utf8" is only useful in cases where the source code is to be treated as UTF-8. That is, analogous to Encode::decode('UTF-8', $source_code).

$ perl -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
F0.9F.98.80 (length = 4)

$ perl -Mutf8 -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
1F600 (length = 1)

The second case shows the string containing a single codepoint (whatever encoding Perl uses to store it internally is immaterial).

---

That aside, the PostgreSQL docs make the following claim.

"Arguments will be converted from the database's encoding to UTF-8 for use inside PL/Perl, and then converted from UTF-8 back to the database encoding upon return."

This is ambiguous. Do they mean that strings are converted to UTF-8 bytes (as in the first example above), or that they are properly stored as 'wide' characters (as in the second example above)? I don't know but it should be wholly unrelated to the use of the utf8 pragma.

Suffice to say that letting the user deal with all of this is almost certainly a wise decision.