Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 125306 - myspell dictionaries not in the tree
Summary: myspell dictionaries not in the tree
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Kevin F. Quinn (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-06 15:28 UTC by Kevin F. Quinn (RETIRED)
Modified: 2006-05-04 07:02 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-06 15:28:27 UTC
With the introduction of hunspell (to support openoffice amongst other things), dictionaries suitable for it should be added to the tree.  This initially means adding the 'myspell' dictionaries, with which hunspell is compatible.  This bug is to make sure I don't forget (since bugzilla will whine about it to me 8-) ).

Some issues to resolve:

1) where the dictionaries should go (/usr/share/myspell?), and how to marry that up with openoffice (soft link from /usr/lib/openoffice/share/dict/ooo/?).

2) where to get the dictionaries from; copy debian packaging, or get them from http://lingucomponent.openoffice.org/download_dictionary.html

w.r.t. (2) it seems for some languages there are better dictionaries available than those on the openoffice download site.
Comment 1 Andreas Proschofsky (RETIRED) gentoo-dev 2006-03-16 13:21:17 UTC
Sorry for late comment, but just found a least a part-solution ;)

Looks like openoffice (at least the built from source version) already provides a script for handling adding dictionaries from /usr/share/myspell, the script is located in /usr/lib/openoffice/install-dict

You just have to call it without parameters and it does all the stuff for us, symlinking dictionaries and adding them to dictionary.lst, also removing them, if they are no longer present.

So I think it would be a good idea to add it to openoffice-bin and we should have a nice solution. Also calling it from both the openoffice-builds and the myspell-ebuilds, so we are always up-to-date.

About which dictionaries to use: Will have to look around, just recognized that SUSE is using other sources too, so maybe we should follow what others are doing. But this shouldn't hold up the ebuilds as we can easily use other sources later on, too.
Comment 2 Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-16 15:42:10 UTC
That script is handy - saves effort.

I agree it's better to get stuff in then fiddle with it later.  I'm finalising a set of ebuilds for the language files available from the OpenOffice download area.

Currently my ebuilds stand at 92 - one for each language/country variant.  Do you think it would be better to have one ebuild for each language, and include all variants within the ebuild for the main language?  That reduces the ebuild count to 42.
Comment 3 Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-18 05:29:49 UTC
Ok; I think it's sensible to have one ebuild per language, not one per variant.  To that end I have 42 ebuilds; just need to check the licenses (which has to be done manually) and we should be good to go.

The /usr/lib/openoffice/install-dict is somewhat unsatisfactory, although the ebuilds use it for now.  The script only knows about the files it finds in /usr/share/myspell, which means that it doesn't reference alternative hyphenation and thesaurus dictionaries for language variants that don't have their own.  The ebuilds have enough information to do this properly, but I'll need to modify/rewrite the script.

I've uploaded some stuff to my dev space, should any of you wish to take a look
(http://dev.gentoo.org/~kevquinn/myspell).  I'd appreciate a quick scan of the at least the eclass.
Comment 4 Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-22 23:59:08 UTC
Update - ebuilds (well, the eclass) now sort out the symlinking directly, with full entries in dictionary.lst so that language variants using a hypenation and/or thesaurus dictionary from another variant work ok (something the openoffice script cannot do).  The information to do this outside of portage is stored in /usr/share/myspell as a section of the dictionary.lst file.

Also removed default KEYWORDS and HOMEPAGE from the eclass, sorted out LICENSE information.

I think this is all pretty much ready to go...
Comment 5 Andreas Proschofsky (RETIRED) gentoo-dev 2006-03-31 03:07:01 UTC
I had a quick look in the morning, so sorry if I ask some stupid question, time is a little bit on the short side atm ;)

So: How are we going to handle a OOo-upgrade? Is there a function I could call from postinstall in the OOo-ebuilds that would re-add the symlinks?

Also: Do I understand that right, you plan to put all the dictionaries on the distfiles mirrors (cause atm there is no "direct" SRC_URI, at least in the demo-ebuild).

Another thing, I noticed, is that the script obviously is only called when someone uses the "openoffice" USE-Flag, wouldn't it be easier to automatically do that, if (for instance) dictionary.lst is there? This way we wouldn't need a use-flag.

And: Do we really need a meta-ebuild? Or is this just for testing purposes?

So much for now...
Comment 6 Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-31 08:47:01 UTC
(In reply to comment #5)
> So: How are we going to handle a OOo-upgrade? Is there a function I could call
> from postinstall in the OOo-ebuilds that would re-add the symlinks?

At the moment, I've designed the link management so that:
1) it's post-install, so the links are not managed by the portage database
2) it doesn't overwrite dictionaries that are already there

(1) means the symlinks aren't owned by the myspell ebuilds, so the OOo ebuild will overwrite the symlinks if it wants to install a dictionary (no collision in the portage database), if I've understood portage properly.

(2) means that the myspell ebuilds don't interfere with anything the OOo ebuild has done, keeping a clear distinction that the /usr/lib/openoffice area is "owned" by the openoffice ebuild.

The idea is that the OOo dictionary area is "owned" by the openoffice ebuild not the myspell ebuilds.  I was thinking it might be a good idea to adjust openoffice so it installs no dictionaries, but RDEPENDs on the relevant myspell dictionary according to LINGUAS - unfortunately LINGUAS isn't USE_EXPANDed so it's not so simple.  Alternatively just skip all dictionaries OOo wants to install and inform the user to emerge whatever myspell dictionaries they would like.  Another alternative, perhaps openoffice could check if the myspell dictionary symlink is present for a dictionary, and skip the installation of its own dictionary if so.

In the long term, I expect the myspell ebuilds to keep track with at least whatever is on the openoffice.org repository, which should mean the myspell dictionaries are never less than what openoffice would install.  Each time openoffice changes version it's simple enough to check whether any of the dictionaries have newer files on the openoffice repository and updating the relevant myspell ebuilds appropriately.

> Also: Do I understand that right, you plan to put all the dictionaries on the
> distfiles mirrors (cause atm there is no "direct" SRC_URI, at least in the
> demo-ebuild).

Yes.  Unfortunately the upstream files at openoffice.org do not have a version number, so mirroring is the only way to go as far as I know.  In total the dictionaries are about 38MB.  Later on, if an ebuild switches away from the openoffice.org repository to one "further upstream" that does add a version to its files, setting SRC_URI in the ebuild will override the eclass.

> Another thing, I noticed, is that the script obviously is only called when
> someone uses the "openoffice" USE-Flag, wouldn't it be easier to automatically
> do that, if (for instance) dictionary.lst is there? This way we wouldn't need a
> use-flag.

I wasn't sure whether to control the changes to the openoffice-owned directory via use flag or not.

> And: Do we really need a meta-ebuild? Or is this just for testing purposes?

No, we don't need one - that was indeed just for testing that they all install ok without conflict.  It's not intended for the portage tree.  Actually since I realised that:

cd /usr/portage; emerge app-dicts/myspell*

would do the same thing I should delete it anyway...

> So much for now...

Thanks :)
Comment 7 Kevin F. Quinn (RETIRED) gentoo-dev 2006-03-31 09:47:34 UTC
(In reply to comment #6)
> I wasn't sure whether to control the changes to the openoffice-owned directory
> via use flag or not.

Didn't finish this reply!  I'm happy to drop the use flag, I'm no longer convinced it was a good idea :)
Comment 8 Andreas Proschofsky (RETIRED) gentoo-dev 2006-04-07 06:26:54 UTC
(In reply to comment #6)
> 
> The idea is that the OOo dictionary area is "owned" by the openoffice ebuild
> not the myspell ebuilds.  I was thinking it might be a good idea to adjust
> openoffice so it installs no dictionaries, but RDEPENDs on the relevant 
myspell
> dictionary according to LINGUAS - unfortunately LINGUAS isn't USE_EXPANDed so
> it's not so simple.  Alternatively just skip all dictionaries OOo wants to
> install and inform the user to emerge whatever myspell dictionaries they would
> like.  Another alternative, perhaps openoffice could check if the myspell
> dictionary symlink is present for a dictionary, and skip the installation of
> its own dictionary if so.

Actually alternative 2 is what I had in mind. Just install OOo without any dictionaries and tell the users to install the myspell-ebuild they want to. That's the easiest one (the LINGUAS stuff sounds like a good idea in principle but is to complicated in reality, I think). Also that's not only how all the other distros do it, but also in-line with the other spell-checking stuff in Gentoo (IIRC), aspell for instance.

Comment 9 Andreas Proschofsky (RETIRED) gentoo-dev 2006-04-19 04:48:53 UTC
@Kevin: Without putting pressure on you ;) What's the current status of this?
Comment 10 Kevin F. Quinn (RETIRED) gentoo-dev 2006-04-26 23:51:25 UTC
Andreas - I uploaded a full overlay for the current state in my devspace (~kevquinn/myspell) which I'm hoping to be able to spend time this evening for a final check before committing.  I'll commit them as just ~x86 since I don't have ppc or sparc to test on; once they're committed and we're happy with them I'll bug for ~ppc/~sparc marking.

re. LINGUAS - it would be easy I think (it does USE_EXPAND), but it would make for a huge dep list in OOo which I agree isn't worth the effort to write or maintain; a simple postinst message would be enough.

I'll leave the myspell stuff acting as it does; i.e. it won't alter existing files in the OOo dictionary area.
Comment 11 Kevin F. Quinn (RETIRED) gentoo-dev 2006-05-01 09:27:47 UTC
ok; finally they're all in the tree, assuming I didn't forget anything.

Just ~x86 for now.  I'll bug for ~ppc and ~sparc once a couple of people have tried some out.
Comment 12 Andreas Proschofsky (RETIRED) gentoo-dev 2006-05-01 10:35:55 UTC
Great work, thanks Kevin!

Still one thing to discuss, though ;) I really think the myspell ebuils SHOULD mess with dictionary.lst if openoffice is installed, otherwise we have a bad situation. Looking at it from a users point of view:

*) Install openoffice, get a post-install message, saying: "for spell checking in your language install the apropiate myspell-dictionary"

*) User installs the myspell-dictionary

*) User wonders why the spell-checking in his language does still not work.

Or am I missing something?
Comment 13 Andreas Proschofsky (RETIRED) gentoo-dev 2006-05-01 11:08:05 UTC
Another question: Which function in myspell.eclass should I call from openoffice to re-create a correct dictionary.lst after an install? Shouldn't that be myspell_pkg_postinst? Or how is this supposed to work?
Comment 14 Kevin F. Quinn (RETIRED) gentoo-dev 2006-05-01 11:36:50 UTC
> Still one thing to discuss, though ;) I really think the myspell ebuils SHOULD
> mess with dictionary.lst if openoffice is installed,

It does :)  My openoffice now has 41 main spelling languages, with many variants, to choose from :)

The only thing it avoids, is overwriting files in the OOo dictionary area with symlinks to the myspell ones.  So if the the openoffice build installs a dictionary (e.g. the default English dictionary) then installing myspell-en won't replace those files with symlinks to the myspell dictionaries.

The reasoning is that the files in the OOo dictionary area installed with he openoffice ebuild are owned by the openoffice in the package database, so if you upgrade it'll either overwrite what the myspell ebuilds did, or declare a collision (not sure which).

The other way around, i.e. if openoffice tries to overwrite existing symlinks created by an earlier install of a myspell dictionary, is ok since the symlink is a postinst action so is not recorded in the package database for the myspell dictionary.

So ideally, the openoffice ebuild would install no dictionaries, if the user wishes to use myspell ones.

In summary, doing:

1) install openoffice, no dictionaries - get message to install dictionary
2) install myspell-<lang>

openoffice now sees the myspell language corretly for the dictionary, along with the relevant thesaurus and hyphenation files.

However if openoffice installs a dictionary:

1) install openoffice, including built-in English dictionary
2) install myspell-en

openoffice continues to use the dictionary supplied with openoffice, not the myspell one.


At some point we could provide a user interface for fiddling with the dictionaries (perhaps an eselect module) but if openoffice is set to install no dictionaries won't really be necessary.  I have made sure all the information necessary is available in /usr/share/myspell/dictionary.lst.<lang>.
Comment 15 Kevin F. Quinn (RETIRED) gentoo-dev 2006-05-01 11:52:07 UTC
(In reply to comment #13)
> Another question: Which function in myspell.eclass should I call from
> openoffice to re-create a correct dictionary.lst after an install? Shouldn't
> that be myspell_pkg_postinst? Or how is this supposed to work?

I wouldn't expect you to call the myspell.eclass stuff from a different package - it's a real eclass rather than just a utility library; I don't know what happens if you try that.

I'll have to think a bit about what happens when openoffice is installed later (e.g. when upgraded).  Ideally, openoffice installation shouldn't overwrite the dictionary.lst file as it's a system config file.  Adding /usr/lib/openoffice/share/dict/ooo to CONFIG_PROTECT is probably the most proper approach; alternatively you could remove ${D}/usr/lib/openoffice/share/dict/ooo/dictionary.lst if ${ROOT}/usr/lib/openoffice/share/dict/ooo/dictionary.lst exists, in src_install.


The easiest thing to do if dictionary.lst does get wiped out is:

cat /usr/share/myspell/dictionary.lst.* >> \
    /usr/lib/openoffice/share/dict/ooo/dictionary.lst

but I will think properly about an eselect module (ultimately I would change myspell.eclass to use that).

Comment 16 Andreas Proschofsky (RETIRED) gentoo-dev 2006-05-04 07:02:35 UTC
(In reply to comment #15)
> I wouldn't expect you to call the myspell.eclass stuff from a different package
> - it's a real eclass rather than just a utility library; I don't know what
> happens if you try that.
> 
> I'll have to think a bit about what happens when openoffice is installed later
> (e.g. when upgraded).  Ideally, openoffice installation shouldn't overwrite the
> dictionary.lst file as it's a system config file.  Adding
> /usr/lib/openoffice/share/dict/ooo to CONFIG_PROTECT is probably the most
> proper approach; alternatively you could remove
> ${D}/usr/lib/openoffice/share/dict/ooo/dictionary.lst if
> ${ROOT}/usr/lib/openoffice/share/dict/ooo/dictionary.lst exists, in
> src_install.

I don't think this is a good solution, for instance between 2.0.1 and 2.0.2 some dictionaries have been added to the default install, this way we would loose these change. 

> The easiest thing to do if dictionary.lst does get wiped out is:
> 
> cat /usr/share/myspell/dictionary.lst.* >> \
>     /usr/lib/openoffice/share/dict/ooo/dictionary.lst

sounds reasonable

> but I will think properly about an eselect module (ultimately I would change
> myspell.eclass to use that).

Which problem do you want to solve by an eselect module?