meeting_2006-08-29

Meeting with Polderland 29.8.2006

Participants:

  • Peter Beinema
  • Thomas Omma
  • Tomi Pieski
  • Sjur Moshagen

Agenda

  • questions and answers

Divvun tasks

Data examples from Divvun:

  • take a couple of roots (V, N, A)
  • for each root, inflect it, derive it as much as possible, and inflect all derivations

Polderland tasks

PL will split up the words (L, R, possibly middle parts), add PLX tags and regenerate the word forms. The results will be sent to Divvun, which should evaluate whether the output is correct word forms, and whether the approach seems feasible.

MS Office issues

Language codes for Sámi? They have already been defined by MS, Sjur will send them to Polderland.

OOo, Aspell and compounds

PL: Aspell and the default OOo speller engine have very limited compounding facilities.

Divvun: we are aware of it, but will nevertheless make at least the Aspell speller - some users will prefer a speller without compounding. We haven't yet decided which speller engine we will use for the OOo application.

Exclusion lists

Problem examples from Dutch:

  • Moskou (Moscow) != mos+kou => forbid "moskou";
  • zeepaard = zee+paard (sea+horse) and NOT zeep+paard (soap+horse)

Exclusion list possible with the PL formalism.

PL has Perl scripts for generating exclusion list candidates, and is willing to give the scripts to Divvun. - Yes, please: -)

Compounds and suggestions:

  • exclude 'zeeppaard' => X+zeeppaard and zeeppaard+X will/can be excluded
  • suggestions will be checked against the exclusion list

TODO

  • send extended data lists to PL (Thomas)
  • send MS language codes for Sámi to PL (Sjur)
  • send perl script for generating exclusion list candidates to Divvun ( Peter)