I've saved a document with Abiword 2.0.1 in its native abw format. When I tried to open it again, Abiword complained that it cannot open my file because "It appears to be an invalid document". I've checked the file with an XML validator that complained about lines like this one: <s type="P" name="Textk
I've saved a document with Abiword 2.0.1 in its native abw format. When I tried to open it again, Abiword complained that it cannot open my file because "It appears to be an invalid document". I've checked the file with an XML validator that complained about lines like this one: <s type="P" name="Textkörper" basedon="Normal" props="line-height:1.000000; font-family:Georgia; font-size:12pt; font-style:italic"/> Apparently the German umlaut "ö" is invalid. I've edited the file and voila, Abiword opens it without a problem. "Textkörper" is German for "text body".The file used to be a .doc and was written with the German version of Microsoft Word. Those style names get imported by Abiword without checking for invalid characters. This bug is reproducible. I'm marking it "critical" because it's basically data loss for someone who doesn't know how to fix these invalid .abw files. Reproducible: Always Steps to Reproduce: 1. Take a German .doc file (other locales with non-ASCII-letters might work as well) 2. Open file with Abiword, Save as .abw 3. Close Abiword and open .abw file Actual Results: Error message pops up, file cannot be opened. Expected Results: Open the file. Portage 2.0.49-r15 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r3, 2.4.20-gentoo-r8) ================================================================= System uname: 2.4.20-gentoo-r8 i686 AMD Athlon(tm) Processor Gentoo Base System version 1.4.3.10 ccache version 2.3 [enabled] ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O2 -march=athlon -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /var/qmail/control /usr/share/config /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-O2 -march=athlon -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="sandbox ccache autoaddcvs" GENTOO_MIRRORS="http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://212.219.247.10/sites/www.ibiblio.org/gentoo/ ftp://ftp.easynet.nl/mirror/gentoo/ http://ftp.easynet.nl/mirror/gentoo/" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 oss apm avi crypt cups encode foomaticdb gif jpeg libg++ libwww mad mikmod motif mpeg ncurses nls pdflib png quicktime spell truetype xml2 xmms xv zlib gdbm berkdb slang readline svga java X sdl gpm tcpd pam ssl perl python imlib oggvorbis gnome opengl mozilla 3dnow mmx sse gphoto2 gstreamer fbcon bonobo gtkhtml tiff esd cdr gtk gtk2 -kde -arts -qt"
known issue mg ?
Not really. Let me get more info. This occurs whenever (and only when) you have a style name with non-ascii characters?
I've done some testing. I've created an Abiword file with a style name that contains non-ascii characters. It saves and opens without a problem and the style name is preserved. So far so good. (Document will be attached as style-with-umlauts.abw) Then I created a file in MS Word using one of the default styles with umlauts called "
I've done some testing. I've created an Abiword file with a style name that contains non-ascii characters. It saves and opens without a problem and the style name is preserved. So far so good. (Document will be attached as style-with-umlauts.abw) Then I created a file in MS Word using one of the default styles with umlauts called "Überschrift 1". When opened in Abiword this style was automatically translated to "Heading 1" and subsequently saved as "Heading 1" in other file formats. This seems like a nice localisation feature :-) (My gentoo installation is English, not German) My 3rd test was with the Word document that caused my problem in the first place. It contains the "Textkörper" style. When opened in Abiword, the style name shows up as "Textk" in the style drop down. I've edited the document for clarity and saved two version which I will attach to this bug: abitest2.rtf is the RTF file containing the offending style name. However, Abiword will open it correctly. abitest2.abw is the same RTF file saved in Abiword format. Abiword, however, refuses to open it.
Created attachment 21352 [details] Document with non-ascii style name created in Abiword.
Created attachment 21354 [details] MS Word file, saved by Abiword as RTF
Created attachment 21355 [details] MS Word file, saved by Abiword in its native format Abiword refuses to open this file on my machine.
K, I was thinking along the lines of xml encoding but this helps narrow it down. I'll create a more explicit isolation of the bug and pass it along to the AbiWord developer who can fix it (ASAP, since this is a rather upsetting bug), and I'll let you know what progresses. Thanks for the report.
any progress to report mg ? @ reporter, this still a problem in most recent abiword in the tree ?
Actually yes. This bug family is getting fixed in stages. 2.0.3's puredoc fix took part of it on, but a very recent fix which just got backported to the 2.0 branch should probably fix this (34463). Our behavior may still be suboptimal but this specific bug is hopefully fixed (if hackishly) in the rtf exporter. 2.0.4 is scheduled for Wednesday. Please check with that version and let me know how it performs. Thanks. (also, be sure to check original abw native in addition to the various rtf and doc incarnations, don't want to miss anything and abi has been having a lot of xml parser issues lately)
I've tried Abiword 2.0.5. The bug seems to be fixed now: - The style name "Textk
I've tried Abiword 2.0.5. The bug seems to be fixed now: - The style name "Textkörper" as found in the example RTF file is no longer truncated but displayed properly. - Saving an MS Word document with a non-ascii style name as an abw-document no longer results in unreadable files. The broken abw file (3rd attachment of this bug report) can be opened by Abiword but the document will be empty. Kudos to the developers :-)
marking fixed