Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 906206

Summary: app-office/libreoffice: broken with >=dev-libs/libxml2-2.11.2, throws SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
Product: Gentoo Linux Reporter: Ivan <regboxemg>
Component: Current packagesAssignee: Gentoo Office Team <office>
Status: RESOLVED FIXED    
Severity: normal CC: base-system, sam
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 905399    
Attachments: emerge --info

Description Ivan 2023-05-12 13:09:41 UTC
After recent regular upgrade LibreOffice refuses to open some docx files with the following error:

SAXException: [word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605

or

 SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605', Stream 'word/document.xml', Line 2, Column 28976 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/writerfilter/source/filter/WriterFilter.cxx:21



Reproducible: Always

Steps to Reproduce:
1. Upgrade libxml2 to version 
2. Open 
3. Behold the popup with aforementioned error.
Actual Results:  
LibreOffice shows an error when opening certain (normal) files. 


Simple downgrade of libxml library to version 2.10.4 fixes the issue. Upgrading to 2.11.2+ breaks things again.

libXML2 use flags and info:

dev-libs/libxml2-2.11.2-r1-1:2::gentoo  
USE="ftp icu lzma python readline -debug -examples -static-libs -test" ABI_X86="32 (64) (-x32)" 
PYTHON_TARGETS="python3_11 -python3_10"

LibreOffice use flags and info:

app-office/libreoffice-7.5.3.2-1::gentoo  
USE="branding clang cups dbus gtk java kde ldap vulkan -accessibility -base -bluetooth -coinmp -custom-cflags -debug -eds -firebird -googledrive -gstreamer -mariadb -odk -pdfimport -postgres -test" 
LIBREOFFICE_EXTENSIONS="-nlpsolver -scripting-beanshell -scripting-javascript -wiki-publisher" 
PYTHON_SINGLE_TARGET="python3_11 -python3_10"

I will attach emerge --info in txt file below.
Comment 1 Ivan 2023-05-12 13:10:20 UTC
Created attachment 861575 [details]
emerge --info
Comment 2 Ivan 2023-05-12 13:13:27 UTC
Forgot to edit steps to reproduce and sent incomplete version (facepalm).

Correction below.
> Steps to Reproduce:
> 1. Upgrade libxml2 to version 
> 2. Open 
> 3. Behold the popup with aforementioned error.
Comment 3 Ivan 2023-05-12 13:17:04 UTC
And again I tried to change the title, and comment got sent incomplete. Please laugh at me. Sorry.

Correction below.
> Steps to Reproduce:
> 1. Upgrade libxml2 to version 2.11.2
> 2. Open regular docx files (not exactly complex, all contain few pages: agreements, work stuff).
> 3. Behold the popup with aforementioned error.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-05-12 22:51:42 UTC
Thank you for the report.
Comment 5 Andreas Sturmlechner gentoo-dev 2023-05-13 17:03:36 UTC
If you save a new copy of a problematic file with libreoffice-7.5.3.2 and libxml2-2.10.4, then try to open it after upgrade to >=libxml2-2.11.2, can you reproduce that error?

All info I could find yet were cases of broken files being the cause of such errors.
Comment 6 Ivan 2023-05-13 18:59:31 UTC
(In reply to Andreas Sturmlechner from comment #5)
> If you save a new copy of a problematic file with libreoffice-7.5.3.2 and
> libxml2-2.10.4, then try to open it after upgrade to >=libxml2-2.11.2, can
> you reproduce that error?
> 

Hello.
I just tried it, and yes, it is reproducible.
I resaved file with LO-7.5.3.2 and libxml2-2.10.4. And then tried to open it when libxml2-2.11.3 was installed. Almost the same error outputs.

First:

SAXException: [word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605

Second:

SAXParseException: '[word/document.xml line 2]: Input is not proper UTF-8, indicate encoding !
 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/sax/source/fastparser/fastparser.cxx:605', Stream 'word/document.xml', Line 2, Column 86725 at /tmp/portage/app-office/libreoffice-7.5.3.2/work/libreoffice-7.5.3.2/writerfilter/source/filter/WriterFilter.cxx:213
Comment 7 Andreas Sturmlechner gentoo-dev 2023-05-13 22:58:00 UTC
I've tried a few docx, xslx files and couldn't reproduce that issue so far.
Comment 8 Ivan 2023-05-14 02:54:38 UTC
(In reply to Andreas Sturmlechner from comment #7)
> I've tried a few docx, xslx files and couldn't reproduce that issue so far.

Maybe it has something to do with the language or something. 

I have plenty of work files (all docx, all contain 3 or more pages) that cause this error. 
Different fonts, different dates of creation or last save (some - month ago, some - about a year ago). Can't find the pattern / conditions what leads to this issue.

Tried to rebuild LibreOffice with new libxml2. No luck, same error.
Comment 9 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2023-05-14 04:51:09 UTC
(In reply to Ivan from comment #8)
> (In reply to Andreas Sturmlechner from comment #7)
> > I've tried a few docx, xslx files and couldn't reproduce that issue so far.
> 
> Maybe it has something to do with the language or something. 
> 
> I have plenty of work files (all docx, all contain 3 or more pages) that
> cause this error. 
> Different fonts, different dates of creation or last save (some - month ago,
> some - about a year ago). Can't find the pattern / conditions what leads to
> this issue.
> 
> Tried to rebuild LibreOffice with new libxml2. No luck, same error.

Does a new file cause the error? If so, can you make a dummy file and share it?
Also include the exact steps to re-create the file, and then we can compare it.
Comment 10 Ivan 2023-05-14 22:08:28 UTC
> Does a new file cause the error? If so, can you make a dummy file and share
> it?
> Also include the exact steps to re-create the file, and then we can compare
> it.

New files don't have this error somehow.

If I open the 'problematic' file with LO-7.3.5.2 (with libxml2 v. 2.10.4 installed), add one space somewhere and then save it as a new file, then I can open the new file with LibreOffice even when libxml2-2.11.3 is installed.

Tried to downgrade LibreOffice to version 7.4.6.2 - and got the same error. So far only downgrading libxml2 solves this error for me.

Things are weird.
Comment 11 Maxim Fomin 2023-05-16 10:30:28 UTC
I have similar issue: after updating to 'dev-libs/libxml2-2.11.3' I cannot open some odt files (others mention docx not working) containing non-Latin characters, but still can open ods (spreadsheet) files.
Comment 12 Larry the Git Cow gentoo-dev 2023-05-19 00:30:33 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=64f596cbb52d0955503281d6998154eacb48d065

commit 64f596cbb52d0955503281d6998154eacb48d065
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-05-19 00:29:27 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-05-19 00:29:27 +0000

    dev-libs/libxml2: add 2.11.4
    
    This _might_ fix the LibreOffice issue.
    
    Bug: https://bugs.gentoo.org/905399
    Bug: https://bugs.gentoo.org/906206
    Signed-off-by: Sam James <sam@gentoo.org>

 dev-libs/libxml2/Manifest              |   1 +
 dev-libs/libxml2/libxml2-2.11.4.ebuild | 195 +++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+)
Comment 13 Ivan 2023-05-19 17:05:38 UTC
Version 2.11.4 seems to work just fine.

Sam James 
Andreas Sturmlechner
Robin Johnson 
Maksim Fomin 
Larry the Git Cow (:-D)

Thanks for your time and help.
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-05-20 07:13:57 UTC
Thank you!
Comment 15 Larry the Git Cow gentoo-dev 2023-05-20 07:18:41 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=74742dfaadb00f833e7c786c9ea99e0c5e165176

commit 74742dfaadb00f833e7c786c9ea99e0c5e165176
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-05-20 07:17:48 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-05-20 07:17:48 +0000

    profiles: mask intermediate bad libxml2-2.11.* (before <2.11.4)
    
    >=2.11.4 is fine, just 2.11.1 up to 2.11.3 were buggy. Mask to avoid
    confusing bug reports.
    
    Bug: https://bugs.gentoo.org/906206
    Bug: https://bugs.gentoo.org/905399
    Signed-off-by: Sam James <sam@gentoo.org>

 profiles/package.mask | 7 +++++++
 1 file changed, 7 insertions(+)