Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 931701 - dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf
Summary: dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: PgSQL Bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-11 03:01 UTC by Sam James
Modified: 2024-05-11 05:56 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-05-11 03:01:13 UTC
A workaround for Perl was originally added in bug 518522. It broke in bug 792537 and we've since cleaned it up, but that only stops it being added for *new* installs.

Please figure out a way to clean it up from the user's postgresql.conf or warn on the bad line being there given the ebuild was responsible for adding it.
Comment 1 RumpletonBongworth 2024-05-11 05:56:46 UTC
Though I'm not familiar with PL/Perl, I can say that "use utf8" is only useful in cases where the source code is to be treated as UTF-8. That is, analogous to Encode::decode('UTF-8', $source_code).

$ perl -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
F0.9F.98.80 (length = 4)

$ perl -Mutf8 -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str'
1F600 (length = 1)

The second case shows the string containing a single codepoint (whatever encoding Perl uses to store it internally is immaterial).

---

That aside, the PostgreSQL docs make the following claim.

"Arguments will be converted from the database's encoding to UTF-8 for use inside PL/Perl, and then converted from UTF-8 back to the database encoding upon return."

This is ambiguous. Do they mean that strings are converted to UTF-8 bytes (as in the first example above), or that they are properly stored as 'wide' characters (as in the second example above)? I don't know but it should be wholly unrelated to the use of the utf8 pragma.

Suffice to say that letting the user deal with all of this is almost certainly a wise decision.