Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 216506 - app-office/openoffice-bin-2.4.0 displays wrong characters in some fonts
Summary: app-office/openoffice-bin-2.4.0 displays wrong characters in some fonts
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Unspecified (show other bugs)
Hardware: All Linux
: High minor (vote)
Assignee: Gentoo Fonts Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-06 11:31 UTC by Martin von Gagern
Modified: 2008-12-17 13:57 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
file with bad characters, openoffice format (bad4.ods,60.46 KB, application/vnd.oasis.opendocument.spreadsheet)
2008-12-16 11:03 UTC, Ulf Dambacher
Details
file with bad characters, pdf to look at (bad4.pdf,213.05 KB, application/pdf)
2008-12-16 11:04 UTC, Ulf Dambacher
Details
good file, openoffice format (good5.ods,42.51 KB, application/vnd.oasis.opendocument.spreadsheet)
2008-12-16 11:05 UTC, Ulf Dambacher
Details
load this file first and everything gets good... (make_good.odt,7.99 KB, application/vnd.oasis.opendocument.text)
2008-12-16 11:11 UTC, Ulf Dambacher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2008-04-06 11:31:25 UTC
I've got a strange bug here: openoffice-bin suddenly displays small u umlauts (ü, U+00FC) in Bitstream Vera Serif as spacing cedillas (¸, U+00B8). The text was displayed all right yesterday, but fails today. Neither ooffice nor ttf-bitstream-vera were updated in between, though many packages were due to a gnome update. There was also a system reboot in between.

Other apps like kword display this character all right.
ooffice displays the character all right in related fonts like DejaVu Serif or Bitstream Vera Sans.
All parts of ooffice seem affected. Tried text document and spreadsheet.
Reproducible with newly created documents, not only with existing ones.
The issue affects PDF export as well, not only display.

Any suggestions how to debug this?
Comment 1 Martin von Gagern 2008-04-06 20:13:42 UTC
Started new X session on alternate vt, closes ooffice and started it in new session. There ü (U+00FC) displays all right. Closed ooffice in new session, started in original session. ü (U+00FC) displays all right there as well. Opened the document from yesterday. ü (U+00FC) still all right, now a umlauts (ä, U+00E4) gets displayed as permille (‰, U+2030).

Tried the same thing over again. Closed in original session, started in new session, typed some German characters (äöüÄÖÜß), which all display correctly. Opened document from yesterday, which is all right as well. Closed on second session, opened in original session, still works.

I believe the whole thing is quite difficult to reproduce. I can see no pattern in which character replaces which one. I'm not sure if all this session switching has any real effect, or whather it was just luck that without this switching, I got the same replacement several times in a row.
Comment 2 Martin von Gagern 2008-04-29 10:18:04 UTC
I just got similar errors in an email. The message was UTF-8 encoded, and originated on Macintosh Thunderbird. It contained pasted text, which might already have introduced some encoding errors. These errors were not only visible when displaying it using a specific font, but actually stored in the message. I could determine the following mappings in this message:
 ä (U+00E4) -> ‰ (U+2030)
 ö (U+00F6) -> ˆ (U+02C6)
 ü (U+00FC) -> ¸ (U+00B8)
 ß (U+00DF) -> fl (U+FB02)

The fact that the message originated on Mac led me to the following investigation:

$ echo "AÄ OÖ UÜ aä oö uü sß" | recode ..l1 | recode MacRoman..
Aƒ O÷ U‹ a‰ oˆ u¸ sfl
$ echo -n "ÄÖÜäöüß" | recode ..l1 | recode MacRoman..utf-16le | hexdump
0000000 0192 00f7 2039 2030 02c6 00b8 fb02

It looks like this kind of error would occur when characters which are encoded in Latin1 get interpreted as macintosh. For my email, I would assume the error to be on the sending side.

But maybe this hint might help locate the problem in openoffice. Maybe it's sometimes using macintosh encoding when it should be using iso-8859-1 encoding. As I got different characters replaced at different times, it seems to mix encodings in some strange way, though.
Comment 3 Andreas Proschofsky (RETIRED) gentoo-dev 2008-05-29 22:36:33 UTC
Not reproducable here and as you get it somewhere else I highly doubt that this an OOo issue. Looks more likely like a local setup problem, re-assigning to font-herd to make sure we don't miss anything.
Comment 4 Martin von Gagern 2008-05-29 22:43:24 UTC
(In reply to comment #3)
> as you get it somewhere else

I get the same character mapping in other apps, but not the same problem.

As described, in my mail client the issue was due to malformed input, i.e. bad MIME header in e-mail. That mail issue is not font related, whereas the original issue in OOo disappears after switching font, and is not input related, as it happens for newly created documents as well.

(In reply to comment #3)
> Not reproducable here

Makes things difficult to address, I know. If I ever find out more details about this issue, I'll post them here.
Comment 5 Peter Volkov (RETIRED) gentoo-dev 2008-05-30 07:09:31 UTC
(In reply to comment #4)
> That mail issue is not font related, whereas the original issue in OOo
> disappears after switching font, and is not input related, as it happens
> for newly created documents as well.

Does this mean that if you open old document and change font of broken character it'll became visible?

Andreas is it possible to read source of openoffice document to find out what letter is in the document saved on the disc? 
Comment 6 Andreas Proschofsky (RETIRED) gentoo-dev 2008-05-30 07:51:35 UTC
(In reply to comment #5)

> Andreas is it possible to read source of openoffice document to find out what
> letter is in the document saved on the disc? 
> 

Sure, an odt-file is just a packed collection of different XML-files, which can be unzipped.
Comment 7 Peter Volkov (RETIRED) gentoo-dev 2008-05-30 08:03:17 UTC
Martin then could you check odt file itself, what character it contains?
Comment 8 Martin von Gagern 2008-05-30 10:36:50 UTC
(In reply to comment #5)
> Does this mean that if you open old document and change font of broken
> character it'll became visible?

Yes, they became visible. Also, as after restarting X and OOo I had good chances of other characters being mapped wrongly, or--when I was lucky--none at all. So they had to be correct in the document file.

(In reply to comment #7)
> Martin then could you check odt file itself, what character it contains?

I have some problem reproducing the issue right now, so at the moment all characters display correctly. Looking at the original document where I first observed this problem, it now looks OK both in OOo and content.xml.

I don't expect a fix from you, based on that little information I can provide, but hints about where to look if I experience this again would be appreciated, so I can come back with more information next time around.
Comment 9 Peter Volkov (RETIRED) gentoo-dev 2008-06-20 19:41:57 UTC
Closing then as NEEDINFO. Fell free to reopen if you find anything relevant though.
Comment 10 David Clermont 2008-06-21 11:40:04 UTC
I've got the same problem with openoffice-2.4.1. After an update from 2.4.0 the small u umlaut (ä, U+00E4) was displayed as permille (‰, U+2030) in TimesNewRoman. The same character was displayed as some strange letter I couldn't identify in ArialUnicodeMS (Font was imported from Windows). All other fonts were displayed correctly. Even the charactertable from the menu Insert->Special Character was effected by the problem.
The saved files (odt and ods) contained the correct character U+00E4 in UTF-8 encoding (C3 A4). After closing all components of ooffice and restarting them, everything was displayed as expected.

Maybe my configuration might be helpful:
$ emerge --info
Portage 2.1.4.4 (default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r8-20080528 i686)
=================================================================
System uname: 2.6.24-gentoo-r8-20080528 i686 Intel(R) Pentium(R) M processor 1.50GHz
Timestamp of tree: Sat, 21 Jun 2008 07:15:01 +0000
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r13
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium-m -mfpmath=sse -mmmx -msse -msse2 -fomit-frame-pointer -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -march=pentium-m -mfpmath=sse -mmmx -msse -msse2 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps y"
FEATURES="distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo"
LANG="de_DE.utf8"
LC_ALL="de_DE.utf8"
LINGUAS="de en"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/etc/portage_overlay"
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi alsa audiofile berkdb bzip2 cairo cdparanoia cdr cli cracklib crypt cups dbus divx dri dts dvd dvdr dvdread eds emboss encode esd evo exif fam fbcon ffmpeg firefox fortran fuse gdbm gif gnome gpm gstreamer gtk hal iconv ipv6 isdnlog java jpeg kerberos ldap mad midi mikmod mmx mp3 mpeg mudflap ncurses nls nptl nptlonly nsplugin ogg opengl openmp oss pam pcmcia pcre pdf perl png ppds pppd python qt3 qt3support qt4 quicktime readline reflection reiserfs sdl session slang spell spl sse sse2 ssl svg tcpd tiff truetype unicode usb v4l vidix vorbis win32codecs x86 xine xinerama xml xorg xv xvid zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de en" USERLAND="GNU" VIDEO_CARDS="i810 vesa"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

$ emerge -pv openoffice

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ] app-office/openoffice-2.4.1  USE="binfilter cups dbus eds firefox gnome gstreamer gtk java ldap opengl pam -debug -kde -mono -odk -seamonkey -xulrunner" LINGUAS="de en -af -ar -as_IN -be_BY -bg -bn -br -bs -ca -cs -cy -da -dz -el -en_GB -en_US -en_ZA -eo -es -et -fa -fi -fr -ga -gl -gu_IN -he -hi_IN -hr -hu -it -ja -km -ko -ku -lt -lv -mk -ml_IN -mr_IN -nb -ne -nl -nn -nr -ns -or_IN -pa_IN -pl -pt -pt_BR -ru -rw -sh -sk -sl -sr -ss -st -sv -sw_TZ -ta_IN -te_IN -tg -th -ti_ER -tn -tr -ts -uk -ur_IN -ve -vi -xh -zh_CN -zh_TW -zu" 8,548 kB 

Comment 11 Martin von Gagern 2008-06-21 12:35:25 UTC
(In reply to comment #10)
> I've got the same problem with openoffice-2.4.1.

As it occurs for the source package for you and for the bin package for me, the problem shouldn't lie with one of those ebuilds, although it may be in some code common to both. I'd rather suspect upstream OOo, though.

> After an update from 2.4.0 the

Have you tried to downgrade to 2.4.0, see if it reliably works there? As I had the issue with 2.4.0, and it occurred only randomly, I doubt it's related to that upgrade. If it really is, it might have a different cause.

> small u umlaut (ä, U+00E4) was displayed as permille (‰, U+2030)

That should have been called an a umlaut, I guess.

> in TimesNewRoman.

So it doesn't only happen for Bitstream Vera, but other fonts as well. Looks like this is more of an OOo than a font issue. Reassign?

> The same character was displayed as some strange letter I
> couldn't identify in ArialUnicodeMS (Font was imported from Windows).

Maybe take a screenshot next time you see this.

> Maybe my configuration might be helpful:
> $ emerge --info

I see no special similarity to my system, except for the LANG variable.
Comment 12 Martin von Gagern 2008-06-21 15:03:56 UTC
After repeated failures to reproduce this with Bitstream Vera, I just got this issue with Arial, displaying ƒ (U+0129) instead of Ä (U+00C4) in accordance with comment #2. So there is yet one mor font affected.
The character table was afected here as well. Comparing its display to that of gucharmap yielded some more errors, all consistent with comment #2:
à (U+00E0) -> ‡ (U+2021)
á (U+00E1) -> · (U+00B7)
ì (U+00EC) -> Ï (U+00CF)
í (U+00ED) -> Ì (U+00CC)
î (U+00EE) -> Ó (U+00D3)
ï (U+00EF) -> Ô (U+00D4)
I stopped comparing after the Latin Extended-A block. The characters given in the right column of above substitution table get displayed at their correct code points as well, so they appear twice in the character table of OOo.

The fact that this whole issue is so difficult to repruduce, together with the fact that it seems that consequtive codepoints are more likely to be affected, might indicate some kind of race condition.

If anybody would work out where the font encoding is handled in OOo, I might try to get some debugging information about this. Maybe the problem can be seen in the debugger even when it does not materialize in the display. I could imagine duplicate writes to the same map entry.
Comment 13 Peter Volkov (RETIRED) gentoo-dev 2008-06-21 18:50:05 UTC
Do I understand correctly that this issue is reproducible while you have openoffice opened and upgrading it at the same time? It's quite possible that the reason for this problem is that some libraries still stay in memory from old office while some others are loaded from the new one.
Comment 14 Martin von Gagern 2008-06-21 18:54:18 UTC
(In reply to comment #13)
> Do I understand correctly that this issue is reproducible while you have
> openoffice opened and upgrading it at the same time?

I didn't understand David that way, and it's not the case for me.
Comment 15 Ulf Dambacher 2008-12-16 08:24:35 UTC
I have similar problems and I tried different versions of openoffice, either bin or source.
I have a document here, wich shows worng characters on direct open, and correct chars if i open it after i type some special chars "äöüß" in the fonts used in an empty document. I can even load such a document first, and the other document works.
Comment 16 Peter Volkov (RETIRED) gentoo-dev 2008-12-16 08:44:29 UTC
Ulf, could you attach document here with steps how to reproduce behaviour? How should I open document to get broken chars and what to do to see correct? Thank you.
Comment 17 Ulf Dambacher 2008-12-16 11:03:07 UTC
Created attachment 175434 [details]
file with bad characters, openoffice format
Comment 18 Ulf Dambacher 2008-12-16 11:04:04 UTC
Created attachment 175437 [details]
file with bad characters, pdf to look at
Comment 19 Ulf Dambacher 2008-12-16 11:05:49 UTC
Created attachment 175440 [details]
good file, openoffice format

same as bad4 but I removed the first column and - bang - the characters look ok
Comment 20 Ulf Dambacher 2008-12-16 11:09:11 UTC
The software I use:

           app-office/openoffice-2.4.1 
           app-admin/eselect-fontconfig-1.0 (0)
[I--] [  ] media-fonts/corefonts-1-r4 (0)
[I--] [  ] media-fonts/font-adobe-100dpi-1.0.0 (0)
[I--] [  ] media-fonts/font-adobe-75dpi-1.0.0 (0)
[I--] [  ] media-fonts/font-adobe-utopia-type1-1.0.1 (0)
[I--] [  ] media-fonts/font-alias-1.0.1 (0)
[I--] [  ] media-fonts/font-bh-type1-1.0.0 (0)
[I--] [  ] media-fonts/font-cursor-misc-1.0.0 (0)
[I--] [  ] media-fonts/font-misc-misc-1.0.0 (0)
[I--] [  ] media-fonts/font-schumacher-misc-1.0.0 (0)
[I--] [  ] media-fonts/font-util-1.0.1 (0)
[I--] [  ] media-fonts/gnu-gs-fonts-std-8.11 (0)
[I--] [  ] media-libs/fontconfig-2.6.0-r2 (1.0)
[I--] [  ] x11-apps/mkfontdir-1.0.3 (0)
[I--] [  ] x11-apps/mkfontscale-1.0.3 (0)
[I--] [  ] x11-libs/libXfont-1.3.1-r1 (0)
[I--] [  ] x11-libs/libfontenc-1.0.4 (0)
[I--] [  ] x11-proto/fontcacheproto-0.1.2 (0)
[I--] [  ] x11-proto/fontsproto-2.0.2 (0)
[I--] [  ] x11-proto/xf86bigfontproto-1.1.2 (0)

Everything is native amd64 but I get the same on a plain x86 machine
Comment 21 Ulf Dambacher 2008-12-16 11:11:30 UTC
Created attachment 175445 [details]
load this file first and everything gets good...
Comment 22 Martin von Gagern 2008-12-16 15:21:22 UTC
(In reply to comment #17)
> file with bad characters, openoffice format

Failed to reproduce this with openoffice-bin-3.0.0 but could reproduce it with openoffice-bin-2.4.1 so updating might be one way to work around the issue. This really looks like the same issue to me. The fact that a rather large file makes this better reproducible might be another indication for a race condition. I'm currently compiling openoffice-2.4.1 from source here and hope to find something valuable this way.
Comment 23 Ulf Dambacher 2008-12-16 16:49:54 UTC
yes, Ooo3.0 works regarding char encoding.
But I can't upgrade as they have bÖrked the file locking mechanism and I share home directory and files via nfs and samba.
Comment 24 Peter Volkov (RETIRED) gentoo-dev 2008-12-16 17:24:57 UTC
Reopening to close correctly.
Comment 25 Peter Volkov (RETIRED) gentoo-dev 2008-12-16 18:25:43 UTC
(In reply to comment #23)
> yes, Ooo3.0 works regarding char encoding.

Well, this means that this issue is fixed. I doubt it'll be easy and worth efforts to backport fix...

> But I can't upgrade as they have bÖrked the file locking mechanism and I share
> home directory and files via nfs and samba.

I've searched upstream bugzilla and found that actually upstream reimplemented locking using files for locks, but it was designed/works only for native OO formats. There is a known problem with symlinks so if by bÖrked you mean something different, could you report this issue upstream to get it fixed? Probably it's worth to report about non-native formats too...

BTW, thank you for great test-case.
Comment 26 Martin von Gagern 2008-12-17 13:57:52 UTC
I tried to investigate this a bit closer, and even though I didn't get very far, I wanted to share my failed approaches before upgrading to 3.0 again and leaving this issue alone.

First I looked for binary files containing character translation maps corresponding to the Apple Roman encoding. I found such tables in these files:

/usr/lib/libicudata.so.38.1 (4 bytes per table entry)
/usr/lib/openoffice/program/libuno_sal.so.3 (2 bytes per table entry)
/usr/qt/3/lib/libqt-mt.so.3 (2 bytes per table entry)

Changing any of these tables (in a different library loaded via LD_LIBRARY_PATH) didn't result in a different mapping in OOo, so these map's don't seem to be involved in the bug at runtime.

Next I looked at where the arial font was loaded in the first place. To do so I created a wrapper around the open syscall, encabled via LD_PRELOAD. I found two open calls for arial.ttf:

#2  0x485d0cc0 in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#3  0x485d0d4f in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#4  0x485d0e39 in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#5  0x485d132d in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#6  0x485ca57d in GlyphCache::CacheFont () from /usr/lib/openoffice/program/libvcl680li.so
#7  0xb758a870 in X11SalGraphics::setFont () from /usr/lib/openoffice/program/libvclplug_gen680li.so
#8  0xb758a8df in X11SalGraphics::SetFont () from /usr/lib/openoffice/program/libvclplug_gen680li.so
#9  0x4847178f in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#10 0x4847a241 in ?? () from /usr/lib/openoffice/program/libvcl680li.so
#11 0x4847deee in OutputDevice::GetTextHeight () from /usr/lib/openoffice/program/libvcl680li.so
#12 0xb31d48be in ?? () from /usr/lib/openoffice/program/libsc680li.so
#13 0xbfe26cdc in ?? ()
#14 0x00000009 in ?? ()
#15 0x00000000 in ?? ()

#2  0x488776dc in ?? () from /usr/lib/openoffice/program/libpsp680li.so
#3  0x08c1b090 in ?? ()
#4  0x00000000 in ?? ()

As you can see, large portions of relevant stack traces are missing debug symbols, despite use of the split debug FEATURE. Without debug symbols I see little chance in investigating this closer, and I don't feel like tweaking the OOo build system for this. I'll abandon my investigations here, but if anyone should wish to take this further, this information might be useful.