Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 34463 - Abiword 2.0.1 saves invalid XML that can't be opened again
Summary: Abiword 2.0.1 saves invalid XML that can't be opened again
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] GNOME (show other bugs)
Hardware: x86 Linux
: High minor (vote)
Assignee: Gentoo Linux Gnome Desktop Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-11-26 13:10 UTC by stefan.ihringer
Modified: 2004-04-25 14:02 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Document with non-ascii style name created in Abiword. (style-with-umlauts.abw,3.80 KB, text/plain)
2003-11-27 02:07 UTC, stefan.ihringer
Details
MS Word file, saved by Abiword as RTF (abitest2.rtf,7.51 KB, text/plain)
2003-11-27 02:08 UTC, stefan.ihringer
Details
MS Word file, saved by Abiword in its native format (abitest2.abw,24.83 KB, text/plain)
2003-11-27 02:10 UTC, stefan.ihringer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description stefan.ihringer 2003-11-26 13:10:56 UTC
I've saved a document with Abiword 2.0.1 in its native abw format. When I tried
to open it again, Abiword complained that it cannot open my file because "It
appears to be an invalid document". I've checked the file with an XML validator
that complained about lines like this one:

<s type="P" name="Textk
Comment 1 stefan.ihringer 2003-11-26 13:10:56 UTC
I've saved a document with Abiword 2.0.1 in its native abw format. When I tried
to open it again, Abiword complained that it cannot open my file because "It
appears to be an invalid document". I've checked the file with an XML validator
that complained about lines like this one:

<s type="P" name="Textkörper" basedon="Normal" props="line-height:1.000000;
font-family:Georgia; font-size:12pt; font-style:italic"/>

Apparently the German umlaut "ö" is invalid. I've edited the file and voila,
Abiword opens it without a problem.

"Textkörper" is German for "text body".The file used to be a .doc and was
written with the German version of Microsoft Word. Those style names get
imported by Abiword without checking for invalid characters. This bug is
reproducible. I'm marking it "critical" because it's basically data loss for
someone who doesn't know how to fix these invalid .abw files.

Reproducible: Always
Steps to Reproduce:
1. Take a German .doc file (other locales with non-ASCII-letters might work as well)
2. Open file with Abiword, Save as .abw
3. Close Abiword and open .abw file

Actual Results:  
Error message pops up, file cannot be opened.

Expected Results:  
Open the file.

Portage 2.0.49-r15 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r3, 2.4.20-gentoo-r8)
=================================================================
System uname: 2.4.20-gentoo-r8 i686 AMD Athlon(tm) Processor
Gentoo Base System version 1.4.3.10
ccache version 2.3 [enabled]
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=athlon -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /var/qmail/control /usr/share/config
/usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
CXXFLAGS="-O2 -march=athlon -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="sandbox ccache autoaddcvs"
GENTOO_MIRRORS="http://ftp.snt.utwente.nl/pub/os/linux/gentoo
http://212.219.247.10/sites/www.ibiblio.org/gentoo/
ftp://ftp.easynet.nl/mirror/gentoo/ http://ftp.easynet.nl/mirror/gentoo/"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 oss apm avi crypt cups encode foomaticdb gif jpeg libg++ libwww mad
mikmod motif mpeg ncurses nls pdflib png quicktime spell truetype xml2 xmms xv
zlib gdbm berkdb slang readline svga java X sdl gpm tcpd pam ssl perl python
imlib oggvorbis gnome opengl mozilla 3dnow mmx sse gphoto2 gstreamer fbcon
bonobo gtkhtml tiff esd cdr gtk gtk2 -kde -arts -qt"
Comment 2 foser (RETIRED) gentoo-dev 2003-11-26 14:49:55 UTC
known issue mg ?
Comment 3 Mark Gilbert 2003-11-26 16:08:21 UTC
Not really.  Let me get more info.  This occurs whenever (and only when) you have a style name with non-ascii characters?
Comment 4 stefan.ihringer 2003-11-27 02:05:09 UTC
I've done some testing. I've created an Abiword file with a style name that contains non-ascii characters. It saves and opens without a problem and the style name is preserved. So far so good. (Document will be attached as style-with-umlauts.abw)

Then I created a file in MS Word using one of the default styles with umlauts called "
Comment 5 stefan.ihringer 2003-11-27 02:05:09 UTC
I've done some testing. I've created an Abiword file with a style name that contains non-ascii characters. It saves and opens without a problem and the style name is preserved. So far so good. (Document will be attached as style-with-umlauts.abw)

Then I created a file in MS Word using one of the default styles with umlauts called "Überschrift 1". When opened in Abiword this style was automatically translated to "Heading 1" and subsequently saved as "Heading 1" in other file formats. This seems like a nice localisation feature :-) (My gentoo installation is English, not German)

My 3rd test was with the Word document that caused my problem in the first place. It contains the "Textkörper" style. When opened in Abiword, the style name shows up as "Textk" in the style drop down. I've edited the document for clarity and saved two version which I will attach to this bug:

abitest2.rtf is the RTF file containing the offending style name. However, Abiword will open it correctly.

abitest2.abw is the same RTF file saved in Abiword format. Abiword, however, refuses to open it.

Comment 6 stefan.ihringer 2003-11-27 02:07:00 UTC
Created attachment 21352 [details]
Document with non-ascii style name created in Abiword.
Comment 7 stefan.ihringer 2003-11-27 02:08:18 UTC
Created attachment 21354 [details]
MS Word file, saved by Abiword as RTF
Comment 8 stefan.ihringer 2003-11-27 02:10:17 UTC
Created attachment 21355 [details]
MS Word file, saved by Abiword in its native format

Abiword refuses to open this file on my machine.
Comment 9 Mark Gilbert 2003-11-27 08:04:44 UTC
K, I was thinking along the lines of xml encoding but this helps narrow it down.  I'll create a more explicit isolation of the bug and pass it along to the AbiWord developer who can fix it (ASAP, since this is a rather upsetting bug), and I'll let you know what progresses.
Thanks for the report.
Comment 10 foser (RETIRED) gentoo-dev 2004-02-29 08:28:37 UTC
any progress to report mg ?

@ reporter, this still a problem in most recent abiword in the tree ?
Comment 11 Mark Gilbert 2004-03-01 17:25:27 UTC
Actually yes.  This bug family is getting fixed in stages.  2.0.3's puredoc fix took part of it on, but a very recent fix which just got backported to the 2.0 branch should probably fix this (34463).  Our behavior may still be suboptimal but this specific bug is hopefully fixed (if hackishly) in the rtf exporter.

2.0.4 is scheduled for Wednesday.  Please check with that version and let me know how it performs.

Thanks.

(also, be sure to check original abw native in addition to the various rtf and doc incarnations, don't want to miss anything and abi has been having a lot of xml parser issues lately)

Comment 12 stefan.ihringer 2004-03-14 09:35:46 UTC
I've tried Abiword 2.0.5. The bug seems to be fixed now:
- The style name "Textk
Comment 13 stefan.ihringer 2004-03-14 09:35:46 UTC
I've tried Abiword 2.0.5. The bug seems to be fixed now:
- The style name "Textkörper" as found in the example RTF file is no longer truncated but displayed properly.
- Saving an MS Word document with a non-ascii style name as an abw-document no longer results in unreadable files.

The broken abw file (3rd attachment of this bug report) can be opened by Abiword but the document will be empty.

Kudos to the developers :-)
Comment 14 foser (RETIRED) gentoo-dev 2004-04-25 14:02:49 UTC
marking fixed