meeting_2006-10-03
Meeting with Polderland 3.10.2006
Participants:
- Peter Beinema
- Thomas Omma
- Sjur Moshagen
- Tomi Pieski
Agenda
- since last time
- phonetic rules (correction)
- hyphenation data
- questions and answers
Since last time
Polderland:
- all north Saami words used for regression test
- spelling checker now runs on:
- windows PC, for office 10 (XP) / 11 (2003) / 12 (2007)
- Mac OS, Office 12 (2007)
- to be done for both Mac and Win:
- localization (using proper locale)
- language codes (---"--- language codes)
- include phonetic rules in lexicon,
- include compound information (waiting for compounding info)
- add hyphenation information (waiting for hyphenated data)
- localization (using proper locale)
- Mac OS, Office 10 / 11 (spelling checking works, suggestions not yet)
- to be done/Mac Office 10/11:
- Same as for above
- get suggestion mechanism running (MS Speller API issue)
- Same as for above
- windows PC, for office 10 (XP) / 11 (2003) / 12 (2007)
- phonetric rules received
- not included in lexicon yet
- feed-back to Thomas on penalty mechanism
- not included in lexicon yet
Waiting for:
- compounding information in North Saami lexicon
- hyphenation information in North Saami lexicon
- Lule Saami lexicon (+ charset used + phon rules + hyph + compounding info)
Divvun:
- has improved hyphenation routines, but still more work to do
- updates to our lexicons
- work on the proper speller data generation in PLX format
Phonetic rules
Can we write a rule to transform ss into šš? Yes, we can.
Lule Sámi
Core alphabet:
a á b c d e f g h i j k l m n ŋ o p q r s t u v w x y z å ä ø ö å æ -
The following characters are used in names:
é ó ú í è ò ù ë ü ï â ê ô û î ã ý ç č đ ð ñ š ŧ
Not presently used, althoug defined in the Divvun source files:
à ì þ ß ª
Hyphenation
Divvun has a problem with the (present) hyphenation system, it breaks down in the midle of 'a'. Divvun will continue to work on the issue, but Polderland has the Divvun Xerox-style, rewrite rules, which they can study if they find them useful.
Divvun can presently provide a subset of the data as hyphenated.
TODO:
- continue to improve hyphenation (Sjur and Thomas)
- continue with speller data generation/conversion (Tomi)
- continue with Office integration (Polderland)
- deliver updated data sets, including Lule Sámi (Sjur and Børre)
- check Lule Sámi character set needed (cf those defined in the twol file)