After recent regular upgrade LibreOffice refuses to open some docx files with the following error: SAXException: [word/document.xml line 2]: Input is not proper UTF-8, indicate encoding ! at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605 or SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding ! at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605', Stream 'word/document.xml', Line 2, Column 28976 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/writerfilter/source/filter/WriterFilter.cxx:21 Reproducible: Always Steps to Reproduce: 1. Upgrade libxml2 to version 2. Open 3. Behold the popup with aforementioned error. Actual Results: LibreOffice shows an error when opening certain (normal) files. Simple downgrade of libxml library to version 2.10.4 fixes the issue. Upgrading to 2.11.2+ breaks things again. libXML2 use flags and info: dev-libs/libxml2-2.11.2-r1-1:2::gentoo USE="ftp icu lzma python readline -debug -examples -static-libs -test" ABI_X86="32 (64) (-x32)" PYTHON_TARGETS="python3_11 -python3_10" LibreOffice use flags and info: app-office/libreoffice-7.5.3.2-1::gentoo USE="branding clang cups dbus gtk java kde ldap vulkan -accessibility -base -bluetooth -coinmp -custom-cflags -debug -eds -firebird -googledrive -gstreamer -mariadb -odk -pdfimport -postgres -test" LIBREOFFICE_EXTENSIONS="-nlpsolver -scripting-beanshell -scripting-javascript -wiki-publisher" PYTHON_SINGLE_TARGET="python3_11 -python3_10" I will attach emerge --info in txt file below.
Created attachment 861575 [details] emerge --info
Forgot to edit steps to reproduce and sent incomplete version (facepalm). Correction below. > Steps to Reproduce: > 1. Upgrade libxml2 to version > 2. Open > 3. Behold the popup with aforementioned error.
And again I tried to change the title, and comment got sent incomplete. Please laugh at me. Sorry. Correction below. > Steps to Reproduce: > 1. Upgrade libxml2 to version 2.11.2 > 2. Open regular docx files (not exactly complex, all contain few pages: agreements, work stuff). > 3. Behold the popup with aforementioned error.
Thank you for the report.
If you save a new copy of a problematic file with libreoffice-7.5.3.2 and libxml2-2.10.4, then try to open it after upgrade to >=libxml2-2.11.2, can you reproduce that error? All info I could find yet were cases of broken files being the cause of such errors.
(In reply to Andreas Sturmlechner from comment #5) > If you save a new copy of a problematic file with libreoffice-7.5.3.2 and > libxml2-2.10.4, then try to open it after upgrade to >=libxml2-2.11.2, can > you reproduce that error? > Hello. I just tried it, and yes, it is reproducible. I resaved file with LO-7.5.3.2 and libxml2-2.10.4. And then tried to open it when libxml2-2.11.3 was installed. Almost the same error outputs. First: SAXException: [word/document.xml line 2]: Input is not proper UTF-8, indicate encoding ! at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605 Second: SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding ! at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605', Stream 'word/document.xml', Line 2, Column 86725 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/writerfilter/source/filter/WriterFilter.cxx:213
I've tried a few docx, xslx files and couldn't reproduce that issue so far.
(In reply to Andreas Sturmlechner from comment #7) > I've tried a few docx, xslx files and couldn't reproduce that issue so far. Maybe it has something to do with the language or something. I have plenty of work files (all docx, all contain 3 or more pages) that cause this error. Different fonts, different dates of creation or last save (some - month ago, some - about a year ago). Can't find the pattern / conditions what leads to this issue. Tried to rebuild LibreOffice with new libxml2. No luck, same error.
(In reply to Ivan from comment #8) > (In reply to Andreas Sturmlechner from comment #7) > > I've tried a few docx, xslx files and couldn't reproduce that issue so far. > > Maybe it has something to do with the language or something. > > I have plenty of work files (all docx, all contain 3 or more pages) that > cause this error. > Different fonts, different dates of creation or last save (some - month ago, > some - about a year ago). Can't find the pattern / conditions what leads to > this issue. > > Tried to rebuild LibreOffice with new libxml2. No luck, same error. Does a new file cause the error? If so, can you make a dummy file and share it? Also include the exact steps to re-create the file, and then we can compare it.
> Does a new file cause the error? If so, can you make a dummy file and share > it? > Also include the exact steps to re-create the file, and then we can compare > it. New files don't have this error somehow. If I open the 'problematic' file with LO-7.3.5.2 (with libxml2 v. 2.10.4 installed), add one space somewhere and then save it as a new file, then I can open the new file with LibreOffice even when libxml2-2.11.3 is installed. Tried to downgrade LibreOffice to version 7.4.6.2 - and got the same error. So far only downgrading libxml2 solves this error for me. Things are weird.
I have similar issue: after updating to 'dev-libs/libxml2-2.11.3' I cannot open some odt files (others mention docx not working) containing non-Latin characters, but still can open ods (spreadsheet) files.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=64f596cbb52d0955503281d6998154eacb48d065 commit 64f596cbb52d0955503281d6998154eacb48d065 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-05-19 00:29:27 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-05-19 00:29:27 +0000 dev-libs/libxml2: add 2.11.4 This _might_ fix the LibreOffice issue. Bug: https://bugs.gentoo.org/905399 Bug: https://bugs.gentoo.org/906206 Signed-off-by: Sam James <sam@gentoo.org> dev-libs/libxml2/Manifest | 1 + dev-libs/libxml2/libxml2-2.11.4.ebuild | 195 +++++++++++++++++++++++++++++++++ 2 files changed, 196 insertions(+)
Version 2.11.4 seems to work just fine. Sam James Andreas Sturmlechner Robin Johnson Maksim Fomin Larry the Git Cow (:-D) Thanks for your time and help.
Thank you!
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=74742dfaadb00f833e7c786c9ea99e0c5e165176 commit 74742dfaadb00f833e7c786c9ea99e0c5e165176 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-05-20 07:17:48 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-05-20 07:17:48 +0000 profiles: mask intermediate bad libxml2-2.11.* (before <2.11.4) >=2.11.4 is fine, just 2.11.1 up to 2.11.3 were buggy. Mask to avoid confusing bug reports. Bug: https://bugs.gentoo.org/906206 Bug: https://bugs.gentoo.org/905399 Signed-off-by: Sam James <sam@gentoo.org> profiles/package.mask | 7 +++++++ 1 file changed, 7 insertions(+)