All our projects are run in UTF-8, since the spring of 2005.

There is one exception, and it is not in-house: The Oslo web interface Glossa handles only 8-bit code tables. Our texts at glossa are coded in ISO-IR 197.

North Saami

The North Saami parser is localised with UTF-8, as it should be. In our web interface there is a filter to let users feed in c1, d1, s1, etc, instead of the correct Saami letters. Output to the user is in UTF-8.

Lule Saami

Lule Saami can adequately be represented in 8-bit format, by Latin 1 (Lule Saami files can use ñ for n-acute). In order to have only one localisation interface for the languages we work on, Lule Saami has been moved to UTF-8 as well. There is a script, spell-relax.regex, which makes it possible to use both ñ, ń and ŋ.

South Saami

Also South Saami can be represented in 8-bit format, by Latin 1, but just as for Lule Saami, it is stored in UTF-8. The script spell-relax.regex is used for South Saami as well, but with a slightly different purpose: It is used to accept the wide-spread sloppy use of i for ï.

Inari, Skolt and Kildin Saami

Inari Saami, Skolt and Kildin Saami are repesented in Unicode.

Our other languages

All our other languages are in UTF-8 as well.

Note that for Iñupiaq, we do not use the wide-spread 8-bit Interactive IñupiaQ Dictionary encoding, as it has placed the Iñupiaq characters in the ASCII area. There is a converter on the Iñupiaq page.