Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 382687 - ~dev-libs/libpcre-8.13 breaks dev-lang/R: Error in strsplit(), 'split' string 1 is invalid UTF-8
Summary: ~dev-libs/libpcre-8.13 breaks dev-lang/R: Error in strsplit(), 'split' string...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Science Mathematics related packages
URL:
Whiteboard:
Keywords:
: 383897 386091 (view as bug list)
Depends on: 386403
Blocks:
  Show dependency tree
 
Reported: 2011-09-12 12:08 UTC by Vlastimil Babka (Caster) (RETIRED)
Modified: 2012-01-02 19:58 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info on one of the 3 systems I observe the error on (emerge-info.txt,5.55 KB, text/plain)
2011-09-12 12:09 UTC, Vlastimil Babka (Caster) (RETIRED)
Details
Patch to src/main/util.c for pcre >=8.13 (R_pcre_valid_utf8.patch,443 bytes, patch)
2011-09-27 07:28 UTC, Bernd Feige
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-12 12:08:30 UTC
Recently R started doing these errors for me:

Typing the help command, or the shortcut for help on function '?function' results in:
> help()
Error in strsplit(txt, "\n", fixed = TRUE) : 
  'split' string 1 is invalid UTF-8

Similarly, when tab-completing parameters by typing 'name_of_function(<tab>':

> nrow (Error in strsplit(prefix, breakRE, perl = TRUE) : 
  'split' string 1 is invalid UTF-8

The second error even leaves the terminal in a state that it has to be reset to echo characters again.

Note that using LANG=C (my default locale is en_US.utf-8) removes the second type of error, but not the first one (help is therefore unusable).

I've tracked down the problem to the dev-libs/libpcre-8.13 update. Going down to 8.12 fixes the errors. However, don't know which of the packages to blame here. Also didn't find anything on R bug tracker, and google finds only a gentoo-user post: http://archives.gentoo.org/gentoo-user/msg_5b46ade40001fc063eae6e1432bd8ce6.xml
Comment 1 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-12 12:09:25 UTC
Created attachment 286217 [details]
emerge info on one of the 3 systems I observe the error on
Comment 2 SpanKY gentoo-dev 2011-09-17 04:46:54 UTC
you could try libpcre-8.13-r1 and see if it works any better
Comment 3 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-18 20:55:56 UTC
(In reply to comment #2)
> you could try libpcre-8.13-r1 and see if it works any better

Thanks. Unfortunately, no difference for both types of the error.
Comment 4 Casper Ti. Vector 2011-09-19 02:29:04 UTC
Hello, I'm the author of the report in gentoo-user mail list.
My locale is `en_GB.UTF-8', and *both the two problems* reported above can be worked around by temporarily switching to `LANG=C'.

For details, please see the original post in the mail list:
<http://archives.gentoo.org/gentoo-user/msg_5b46ade40001fc063eae6e1432bd8ce6.xml>
Comment 5 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-19 10:01:20 UTC
(In reply to comment #4)
> My locale is `en_GB.UTF-8', and *both the two problems* reported above can be
> worked around by temporarily switching to `LANG=C'.

Right, I made some mistake when testing before. LANG=C, or just LC_CTYPE=C fixes both problems.
Comment 6 Bernd Feige 2011-09-27 07:27:49 UTC
(In reply to comment #5)
I tracked down this error to utf8Valid() in src/main/util.c. This returns (_pcre_valid_utf8()<0), however, with pcre 8.13 a more differentiated system of return values was introduced. So now the result of _pcre_valid_utf8() should be compared against PCRE_UTF8_ERR0 defined in pcre.h to be 0. I attach the small patch that fixes the problem - should probably only be applied when pcre>8.13.

Note that the bug is quite obscure since the call to _pcre_valid_utf8() only happens if one of the strings involved in strsplit() is UTF8, and only for strsplit(fixed=T), just as it is used by help() formatting (Rd2txt.R put()). There, only if the string to split (x) contains UTF8 characters, the bug triggers on the split argument ("\n")...

Regards,
Bernd
Comment 7 Bernd Feige 2011-09-27 07:28:48 UTC
Created attachment 287891 [details, diff]
Patch to src/main/util.c for pcre >=8.13
Comment 8 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-27 13:24:14 UTC
(In reply to comment #6)
> (In reply to comment #5)
> I attach the small
> patch that fixes the problem - should probably only be applied when pcre>8.13.

Perhaps an #ifdef PCRE_UTF8_ERR0 could make the patch possible to apply unconditionally (if the macro is really new in 8.13). The bad thing though is that there is nothing that will ensure R is rebuilt after upgrading to 8.13...
BTW R-2.13.2 is updated to pcre-8.13 (see https://svn.r-project.org/R/branches/R-2-13-branch/doc/NEWS.Rd ) and therefore will probably require >=libpcre-8.13 unconditionally. And it should be released on Sep 30 according to news on the homepage. So perhaps we should just make the dependency in current and older R versions restricted to <8.13 and wait the few days and bump R-2.13.2 with dependency restricted to >=8.13 ...
Comment 9 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-27 13:25:57 UTC
Reassigning to R maintainers as it seems more correct now.
Comment 10 Vlastimil Babka (Caster) (RETIRED) gentoo-dev 2011-09-27 13:26:36 UTC
Sorry for the bugspam...
Comment 11 Aaron W. Swenson gentoo-dev 2011-10-08 10:35:51 UTC
*** Bug 383897 has been marked as a duplicate of this bug. ***
Comment 12 SpanKY gentoo-dev 2011-10-09 16:30:40 UTC
*** Bug 386091 has been marked as a duplicate of this bug. ***
Comment 13 Sébastien Fabbro (RETIRED) gentoo-dev 2012-01-02 19:58:38 UTC
fixed in R-2.14.1 in cvs.