meeting_2007-01-09

Meeting with Polderland 9.1.2007

Participants:

  • Peter Beinema
  • Sjur Moshagen

Agenda

  • since last time
  • questions and answers

Since last time

Polderland:

  • windows spellers + hyphenator sent
  • no response from MS on language codes for Mac yet, will poll them again
  • mklex: split data into pieces internally, then combine results

Divvun:

Our programmer is on sick leave, thus no progress in PLX conversion work lately. Other work has concentrated on general linguistic improvements, and we have started the translation work for the installer.

Windows version

There is presently a mismatch between the speller and the hyphenator, causing the speller to flag hyphenated words as misspelled. Lule Sámi does not presently have a hyphenator, and because of the way MS Office handle language groups, the North Sámi hyphenator will be used instead. This might (or will likely) lead to wrong hyphenation patterns, especially in some vowel sequences.

Possible issues

The big lexicon file (25+ Gb) contains large portions of words starting with:

  • - (hyphen; 0x2d) (3 Gb)
  • e (0x65; 8 Gb)

Could it be earon– or something similar?

Other things

There has been some press coverage of the Sámi project at Polderland in the Netherlands. Peter will send an e-mail about it later: -)

Next meeting

Next Tuesday (16.1.) at the usual time.

TODO:

  • check if North Sámi hyphenation can be disabled when processing Lule Sámi (PLD)
  • make complete PLX data set (Tomi)
  • get language codes to work with Mac Office 2004 (and check MacOffice 2007) ( Polderland)
  • deliver mklex + hyphen script
  • try to find proper compiler version for Adobe Indesign (old version will probably do)
  • try to get an answer to the language codes in MS Office for Mac question from other sources (Sjur)
  • investigate the initial "e" group of words (8 Gb)