meeting_2006-08-29
Meeting with Polderland 29.8.2006
Participants:
- Peter Beinema
- Thomas Omma
- Tomi Pieski
- Sjur Moshagen
Agenda
- questions and answers
Divvun tasks
- take a couple of roots (V, N, A)
- for each root, inflect it, derive it as much as possible, and inflect
Polderland tasks
PL will split up the words (L, R, possibly middle parts), add PLX tags and regenerate the word forms. The results will be sent to Divvun, which should evaluate whether the output is correct word forms, and whether the approach seems feasible.
MS Office issues
Language codes for Sámi? They have already been defined by MS, Sjur will
OOo, Aspell and compounds
PL: Aspell and the default OOo speller engine have very limited compounding
Divvun: we are aware of it, but will nevertheless make at least the Aspell
Exclusion lists
Problem examples from Dutch:
- Moskou (Moscow) != mos+kou => forbid "moskou";
- zeepaard = zee+paard (sea+horse) and NOT zeep+paard (soap+horse)
Exclusion list possible with the PL formalism.
PL has Perl scripts for generating exclusion list candidates, and is willing to give the scripts to Divvun. - Yes, please: -)
Compounds and suggestions:
- exclude 'zeeppaard' => X+zeeppaard and zeeppaard+X will/can be excluded
- suggestions will be checked against the exclusion list
TODO
- send extended data lists to PL (Thomas)
- send MS language codes for Sámi to PL (Sjur)
- send perl script for generating exclusion list candidates to Divvun