meeting_2006-11-21
Meeting with Polderland 21.11.2006
Participants:
- Peter Beinema
- Sjur Moshagen
- Thomas Omma
Agenda
- since last time
- questions and answers
Since last time
Polderland:
- speller: include large lexicon
- -> adapt "mklex" lexicon compiler for large files (found bug in gcc 3.2)
- -> adapt "mklex" lexicon compiler for large files (found bug in gcc 3.2)
- mklex: transfer mklex to macintosh OS/X, to be delivered to divvun
- make Sami-specific; leave out some general functionality
- create scripts to run mklex on your side
- make Sami-specific; leave out some general functionality
- hyphenator:
- insert lexicon lookup-step before pattern matching
- adapt pattern matching (ascii-based, not UTF8/UCS2)
- insert lexicon lookup-step before pattern matching
Divvun:
- first test run of PLX data done, including hyphenation points
Alpha version
Spellers
Both sme and smj. The sme version will be using the latest, 20Gb
Hyphenators
Will use only the limited data delivered to Polderland, and use the fallback
Divvun hyphenation marks: # - word boundary ^ - soft hyphen - - hard hyphen Polderland hyphenation marks: -- hard hyphen - soft hyphen
Possible issues
Clitics
Clitics should be applied to all inflected forms. These are normally marked as
How can we specify the clitics such that they can be combined with inflected forms? Or do we have to pregenerate all word forms with clitics? If so, the size of the generated data will increase more than 10 times (> 250 Gb!)
sample word: xyzzy NR
go Vt
schaaps NL (sheep-)
Next meeting
Next Tuesday (28.11.) at the usual time.
TODO:
- continue to improve hyphenation (Sjur and Thomas)
- provide new batch of hyphenated data (Sjur)
- send clitic problem via e-mail to Polderland (Sjur)
- send first round of PLX data to Polderland (Tomi, Børre)
- make complete PLX data set (Tomi)
- get language codes to work with Mac Office 2004 (and check MacOffice 2007)
- deliver Alpha versions (Polderland) including mklex + hyphen script
- try to find proper compiler version for Adobe Indesign (old version will