meeting_2006-09-05
Meeting with Polderland 5.9.2006
Participants:
- Peter Beinema
- Thomas Omma
- Tomi Pieski
- Sjur Moshagen
Agenda
- questions and answers
- normativity and data samples
Divvun tasks
Question: Does the delivered data set need to be normative at this point? Or
Divvun will soon be able to deliver normative-only datasets, but not yet. The
beakkán+A+Pl+Ill+PxSg1 beakkániiddásam - not norm beakkániiddásan - norm
Received so far (including derivations):
- even-syllable stems: verbs, nouns, adjectives
- odd-syllable stems: verbs, nouns, adjectives
- contracted stems: verbs, nouns, adjectives
PL will receive more even-syllable stem types, to cover all variants of this
Polderland tasks
PL will split up the words (L, R, possibly middle parts), add PLX tags and regenerate the word forms. The results will be sent to Divvun, which should evaluate whether the output is correct word forms, and whether the approach seems feasible.
Exclusion lists
PL has Perl scripts for generating exclusion list candidates, and is willing to give the scripts to Divvun. - Yes, please: -)
The script looks for both/all parts of a compound, and compares them with existing words in the lexicon that are *almost* like the compound stems:
Dutch example1:
Geminates at compound borders:
kerk=church klok=bell lok=piece of hair kerkklok = legal compound, church bell kerklok = legal compound but very probably not intended zeemeeuw=seagull zeem+meeuw = probably not what was intended
Hyphenation
PL is looking for a recent CodeWarrior to compile Universal binaries for
TODO
- send more data sets to PL (Thomas)
- send perl script for generating exclusion list candidates to Divvun