meeting_2006-12-12

Meeting with Polderland 12.12.2006

Participants:

  • Peter Beinema
  • Sjur Moshagen

Agenda

  • since last time
  • questions and answers

Since last time

Polderland:

  • Spelling alphas sent (North S + Lule S., tested)
  • Hyphenation alpha sent (North S.)
  • work in progress on new version of mklex
  • PLX-version of north Sami Nouns processed. Compiles OK into lexicon of 500K, not tested liguistically yet
  • collecting texts for installshield installer to be translated into

Divvun:

  • installed and tested the alpha version
  • all open POSes now converted to PLX

Alpha version

It was received and installed. Installation went without problems.

There are still some problems with ž (U+017E, latin small letter z with caron). Here are some cases:

  • breaks the word: Maŋimužžan - red line until ž
  • does not recognise word: iežaset -> iežaset, that is, a correct word is underlined and is corrected to itself. When the suggestion is inserted, a red underline appears immediately
  • works ok with spelling error: erenoamažit -> erenoamážit (correctly recognised, correct suggestion)
  • works ok, correctly spelled word: ožžon

      Possible issues

      Compounding: three-letter words can be problematic in error detection, as spelling errors can be masked as a compound of short words.

      correct in Dutch: Moskou with capital M (Moscow) incorrect: moskou, but is recognised anyway because of "mos" (= moss) + "kou" (= cold)

      Example:

      word1xxxword3 = wronglyspelledlongword
      

      where xxx is a three-letter word. The problem is most likely not that big, due to the default syllable structure of regular words in Sámi:

      VCV
      CVCV
      VCVC
      

      Thus, only words with the structure VCV will be problematic. There are about 20 such words in our present lexicon.

      Hypotetical example from Norwegian:

      addosjon > addisjon (err > corr)
      

      (the spelling error is analysed as a spurious compound ad-do-sjon; the word "ad" does not exist, but serves to illustrate the problem (the other two words do exist); from earlier experience with Norwegian, we know this problem is real.)

      If -do- could be specified as impossible, the spelling error would be detected. That is, only do- or -do, not -do-, should be allowed in compounds.

      Next meeting

      Next Tuesday (19.12.) at the usual time.

      TODO:

      • re-send speller and hyphenator versions with ž added
      • continue to improve hyphenation (Sjur and Thomas)
      • make complete PLX data set (Tomi)
      • get language codes to work with Mac Office 2004 (and check MacOffice 2007) ( Polderland)
      • deliver Alpha versions (Polderland)
        • done
      • deliver mklex + hyphen script
      • try to find proper compiler version for Adobe Indesign (old version will probably do)
      • prepare (localisation of) installation packages for the Beta version ( Polderland: send "default" texts to be translated)
      • try to get an answer to the language codes in MS Office for Mac question from other sources (Sjur)