Summary: | dev-db/postgresql: cleanup plperl unicode debris in postgresql.conf | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Sam James <sam> |
Component: | Current packages | Assignee: | PgSQL Bugs <pgsql-bugs> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | kfm |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=518522 https://bugs.gentoo.org/show_bug.cgi?id=792537 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Sam James
![]() ![]() ![]() ![]() Though I'm not familiar with PL/Perl, I can say that "use utf8" is only useful in cases where the source code is to be treated as UTF-8. That is, analogous to Encode::decode('UTF-8', $source_code). $ perl -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str' F0.9F.98.80 (length = 4) $ perl -Mutf8 -e '$str = "😀"; printf "%v.X (length = %d)\n", $str, length $str' 1F600 (length = 1) The second case shows the string containing a single codepoint (whatever encoding Perl uses to store it internally is immaterial). --- That aside, the PostgreSQL docs make the following claim. "Arguments will be converted from the database's encoding to UTF-8 for use inside PL/Perl, and then converted from UTF-8 back to the database encoding upon return." This is ambiguous. Do they mean that strings are converted to UTF-8 bytes (as in the first example above), or that they are properly stored as 'wide' characters (as in the second example above)? I don't know but it should be wholly unrelated to the use of the utf8 pragma. Suffice to say that letting the user deal with all of this is almost certainly a wise decision. ping. A user got hit by this in https://bugs.gentoo.org/518522#c10. |