Recently R started doing these errors for me: Typing the help command, or the shortcut for help on function '?function' results in: > help() Error in strsplit(txt, "\n", fixed = TRUE) : 'split' string 1 is invalid UTF-8 Similarly, when tab-completing parameters by typing 'name_of_function(<tab>': > nrow (Error in strsplit(prefix, breakRE, perl = TRUE) : 'split' string 1 is invalid UTF-8 The second error even leaves the terminal in a state that it has to be reset to echo characters again. Note that using LANG=C (my default locale is en_US.utf-8) removes the second type of error, but not the first one (help is therefore unusable). I've tracked down the problem to the dev-libs/libpcre-8.13 update. Going down to 8.12 fixes the errors. However, don't know which of the packages to blame here. Also didn't find anything on R bug tracker, and google finds only a gentoo-user post: http://archives.gentoo.org/gentoo-user/msg_5b46ade40001fc063eae6e1432bd8ce6.xml
Created attachment 286217 [details] emerge info on one of the 3 systems I observe the error on
you could try libpcre-8.13-r1 and see if it works any better
(In reply to comment #2) > you could try libpcre-8.13-r1 and see if it works any better Thanks. Unfortunately, no difference for both types of the error.
Hello, I'm the author of the report in gentoo-user mail list. My locale is `en_GB.UTF-8', and *both the two problems* reported above can be worked around by temporarily switching to `LANG=C'. For details, please see the original post in the mail list: <http://archives.gentoo.org/gentoo-user/msg_5b46ade40001fc063eae6e1432bd8ce6.xml>
(In reply to comment #4) > My locale is `en_GB.UTF-8', and *both the two problems* reported above can be > worked around by temporarily switching to `LANG=C'. Right, I made some mistake when testing before. LANG=C, or just LC_CTYPE=C fixes both problems.
(In reply to comment #5) I tracked down this error to utf8Valid() in src/main/util.c. This returns (_pcre_valid_utf8()<0), however, with pcre 8.13 a more differentiated system of return values was introduced. So now the result of _pcre_valid_utf8() should be compared against PCRE_UTF8_ERR0 defined in pcre.h to be 0. I attach the small patch that fixes the problem - should probably only be applied when pcre>8.13. Note that the bug is quite obscure since the call to _pcre_valid_utf8() only happens if one of the strings involved in strsplit() is UTF8, and only for strsplit(fixed=T), just as it is used by help() formatting (Rd2txt.R put()). There, only if the string to split (x) contains UTF8 characters, the bug triggers on the split argument ("\n")... Regards, Bernd
Created attachment 287891 [details, diff] Patch to src/main/util.c for pcre >=8.13
(In reply to comment #6) > (In reply to comment #5) > I attach the small > patch that fixes the problem - should probably only be applied when pcre>8.13. Perhaps an #ifdef PCRE_UTF8_ERR0 could make the patch possible to apply unconditionally (if the macro is really new in 8.13). The bad thing though is that there is nothing that will ensure R is rebuilt after upgrading to 8.13... BTW R-2.13.2 is updated to pcre-8.13 (see https://svn.r-project.org/R/branches/R-2-13-branch/doc/NEWS.Rd ) and therefore will probably require >=libpcre-8.13 unconditionally. And it should be released on Sep 30 according to news on the homepage. So perhaps we should just make the dependency in current and older R versions restricted to <8.13 and wait the few days and bump R-2.13.2 with dependency restricted to >=8.13 ...
Reassigning to R maintainers as it seems more correct now.
Sorry for the bugspam...
*** Bug 383897 has been marked as a duplicate of this bug. ***
*** Bug 386091 has been marked as a duplicate of this bug. ***
fixed in R-2.14.1 in cvs.