meeting_2007-04-18

Meeting with Polderland 18.4.2007

Participants:

  • Peter Beinema
  • Sjur Moshagen

Agenda

  • Since last time
  • Possible issues
  • Next meeting
  • Todo items for the next meeting

Since last time

Polderland:

  • Lule Sami lexicon problems:
    • found ^A in first file - format problem, mklex won't work
    • segmentation violation on G5: can't repro
    • sorting problem: working on that. amazing problem (after 500M lines); memory related? have resorted, are now comparing
    • using cygwin sort (=GNU)
  • number in compounds: there is no "shortcut" way to denote them. must each be be added to lexicon just like normal alpha words.
  • windows installer: check if executable winzip version will run setup.exe automatically
    • can be done
      • waiting for time estimate
  • investigation on behaviour test tool with/without custom dictionary ongoing
  • Adobe: pending
  • MS Mac 2008 proofing tools: march beta does NOT contain Sami language codes

Divvun:

  • working on integrating the whole speller making, PLX conversion etc. in the make file
    • done, except the G5 issue (see below).
  • working on updating the Lule Sámi conversion to be as good as the North Sámi
    • major technical problems when compiling it - mklex seg-faulting on our G5 server, and GNU sort breaking down and destroying our data
  • haven't had the time to test the new windows installer yet
  • added a number generation tool, will generate all requested numbers in PLX format (presently 1-1000000)

Possible issues

Documenting the PLX format

Can we collect the pieces of information we have received, in a single document, and publish it? We need this for our own sake, it is hard to ensure consistensy as it is now, and it is time-consuming for both parties to debug errors in the PLX source as long as we don't have common documentation to refer to and check against.

PLD: send your proposed text, it is probably OK. I personally think we should make PLX documentation public anyway.

Next meeting

Next week's Tuesday (24.4) at the usual time.

TODO:

  • check seg vio in mklex on G5. Platform? Lex size (36G, 2.5G zipped)? PLD crash early on, but not immediately - after e.g. 1st compression
  • check: can mklex (STDERR/STDOUT) non-7bit ASCII output be in UTF8? PLD
    • the output is UCS-2, in hex format
  • check: is there a limit to word size in PLX? (PLD) => MS Word: approx 100 characters, PLX: not really (>>100)
  • check inconsistent speller behaviour depending on the existence of userdict or not PLD
  • check: installer on Mac should be able to (PLD to check):
    • move existing proofing tools out of the way (into a zip archive?)
    • restore "original" proofing tools on deinstallation
      • yes, but some residues remain after de-installation.
      • is current solution OK, or should we try to do something drastic (eg, add Perl scripts)
      • solution probably requires admin rights - will document
  • test new multilingual Windows installer (Divvun)
  • bug MS contact with the fact that the MS Office 2008 March beta didn't contain the Sámi languages (Sjur)
  • try to find proper compiler version for Adobe Indesign (old version will probably do) (Polderland)
    • CS3 is now released
  • send speller lexicon (or CVS tag) for the beta release to Polderland ( Sjur)