meeting_2006-09-12

Meeting with Polderland 12.9.2006

Participants:

  • Peter Beinema
  • Thomas Omma
  • Sjur Moshagen

Agenda

  • questions and answers

New team members Polderland

  • Marijke Koster
  • Jeroen Daanen

Since last time

Thomas has sent more data to PL, and Peter has made a script to process and "refactor" the "stems" to something more edible by the PL technology.

Polderland tasks

Will try to reach a conclusion whether the present approach is acceptable, or whether a FS machine is a better solution all-in-all.

Now splitting the word forms into subgroups that behave in more or less the same way. The number of subgroups can be fairly large.

Split words into stem + derivation cluster 3000*10000 -> 3000+10000

Current derivation system overgenerates. Unclear whether this is a problem in actual use: PLD compounding also overgenerates, and this is not perceived to be a problem.

Additional problem: not all derivations are possible for all stem forms. Ideosyncratic? Can this be tackled by grouping words in a smart way, and have each group have it's own set of derivations?

Possibility: limit derivations and add additional forms as "full lemma's"

Hyphenation

PL is looking for a recent CodeWarrior to compile Universal binaries for InDesign.

The next version of InDesign will most likely build with XCode. Thus, CodeWarrior is only needed for the present InDesign CS2.

TODO

  • send processed data sets back to Thomas (Peter)
  • review the processed data sets (Thomas)