Meeting with Polderland 2.10.2007


  • Peter Beinema
  • Sjur Moshagen


  • Since last time
  • Possible issues
  • Next meeting
  • Todo items for the next meeting

Since last time


  • dropped: new version of MS windows installer, incl. Lule Sami hyphenation
  • pending, working on new installer (bugs 455 and 421): Mac installer for Office, including Lule Sami Hyphenator
  • dropped: North Sami hyphenator for Mac InDesign
  • pending: Lule Sami hyphenator for Mac InDesign: problem with language identification when combined with North Sami hyphenator
  • fixing bugs: some bugs pending (461, 521, 522)
  • compounding behaviour:
    • 3/4/5-part compound appear to work in general, but not in some specific cases
      • investigating costs of modification of PLX flags. See some ideas below.
  • mac speller: installation issue (admin rights)?
  • zip utility for windows: not functional yet
    • first part works, calling REAL language-dependent installer appears to crash
    • have to do some debugging there
  • Speller behaviour: only suggests if errors are limited to 1 part of compound.
    • undesired (or should be user setting),
    • will be changed in future drop

Ideas for PLX flag modifications:

L + R possibilities are limited
  L: anything but rightmost
  R: anything but leftmost
  B: leftmost only
  E: rightmost only
  ? (not M, in any case): anywhere but leftmost or rightmost

Development costs are probably prohibitive, though, first estimate is over 40 hrs of work. It would not be a ~Sami-specific change, but was not planned. ==> will investigate further, get detailed estimate


  • testing, bug hunting and fixing during the whole week in Tromsø, major linguistic bug fixes. The quality is approaching release quality for North Sámi, except for N-part compounds. Lule Sámi ok, but not as good as North, mainly due to lack of corpus texts
  • major remaining bugs relate to limits in the PLX formalism:
    • adj+noun compounding do not follow the same rules as noun+noun or adj+adj (which means we overgenerate in some cases, and undergenerate in others)
      • will be further investigated by the Divvun team
    • N-part compounds either overgenerate in a bad way, or severely undergenerate

Bug status


Div Pld Description
402 439 word + hyphen + comma is not recognised ("xxx-,") => solved
419 440 3-part compounds not recognised => solved => solved
448 444 Upper-case words get suggestions with Initial Case => solved
449 445 suopmasápmelaš-type compounds accepted by the speller => limits of tagging /compounding mechanism reached

To be solved in next drop

Div Pld Description
455 446 Mac uninstaller doesn't work if run by a non-admin => solved
473 452 Windows installer does not autostart after download => discuss solutions
516 4?? vista installation from zip fails => probably solved by self-extracting zip; not functioning yet

Under investigation by Polderland

Div Pld Description
461 448 Spelling errors with editing distance 1 from lexicalised words do not get correct suggestions => explanation not good enough, investigate more samples *** still investigating
480 455 Speller suggestions not identical in context menu and dialog => MS word behaviour: sorting of same-penalty suggestions; => no plans to repair this one
521 xxx Mac installer only works for admin accounts =>
522 xxx Strange compounding fenomena

To be investigated by Divvun/Users

Div Pld Description
447 443 Windows speller doesn't install on terminal server => SD IT guys will have a look once more

Issue discussion

473 - Windows installer

Alternatives to solve it:

  • make an executable installer file - requires InstallShield, meaning that either Polderland will have to make each installation package, or that the Divvun project will have to get an InstallShield license
  • make self-extracting zip files for download, with autostart options of the extracted objects; some alternative zippers for this:

For both cases different types of protection software (firewalls, anti-virus, etc) might block either download or execution of the installer. But default InstallShield installers behave in the same (or a similar) way, so that should not be any different.


With more than 2-part compounds now available, suggestion speed is sometimes very slow, in the range of 10 seconds.

Casing of suggestions

Example of acro words:

Inflected: INTERREG:s (ie lower-case case endings in normal usage)

user types: INTERREG:ii

Suggestion: INTERREG:I
Expected: INTERREG:i (= lexicon entry)

lexicon has: INTERREG:i

Rough schedule

Polderland didn't meet their self-imposed goal of delivering at the beginning of September, will try around mid September instead.

  • Mid September: planned final drop from Polderland
  • November 1: Divvun code freeze
  • November 15: Indesign speller
  • December 11: public release

Next meeting

Next week (9.10.) at the usual time.


  • PLD drop first version of indesign Hyphenator - done
  • PLD pass information on self-extracting ziptool with autostart option + instructions on how to use it (not in working order yet) - pending
  • PLD Proposal for command line hyphenator: 12/9 ready to be sent by commercial dept - sent
  • PLD do speed tests: Mac/PPC Mac/c2duo Win, Office 11 Office 12 (compounding especially) - interpreter issue (Rosetta): much faster on much slower systems when not interpreted, factor of 20+ Office 2008 has native intel code, will be much faster PLD code is universal binary, but depends for speed on calling method
  • PLD provide information on InDesign language groouping
  • Divvun investigate further the A+N compounds
  • Divvun add Acro case ending casing to Bugzilla (INTERREG: i)