Internationalization (I18N) & Localization (L10N)

Web Architecture [./]
Fall 2009 — INFO 290 (CCN 42593)

Erik Wilde, UC Berkeley School of Information
2009-12-01

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Internationalization (I18N) & Localization (L10N)

Contents

E. Wilde: Internationalization (I18N) & Localization (L10N)

(2) Abstract

Many publishing environments need to support multiple languages. In many cases, the requirement to support multiple languages surfaces in later stages of a product development or publishing solution, which can cause major design changes, driving up costs. Internationalization (I18N) is the approach to design systems which can adapt to different locales. Localization (L10N) is the activity to identify, define, and encode locales, based on internationalized software. For languages using different alphabets, Unicode is the most popular character set today and provides a variety of encoding schemes, each of them being a Unicode Transformation Format (UTF).



Characters

Outline (Characters)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Characters E. Wilde: Internationalization (I18N) & Localization (L10N)

(4) Characters and Computers



Characters E. Wilde: Internationalization (I18N) & Localization (L10N)

(5) Characters

Character. (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape […]

The Unicode Standard, Version 4.0, Addison-Wesley, 2003 [http://dret.net/biblio/reference/unicode4]



Characters E. Wilde: Internationalization (I18N) & Localization (L10N)

(6) Glyphs

[A Glyph is] a recognizable abstract graphic symbol which is independent of a specific design.

ISO/IEC 9541:1991, Information Technology – Font Information Interchange [http://dret.net/biblio/reference/iso9541]



Character Sets

Outline (Character Sets)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(8) History of Character Sets



Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(9) ASCII 1963

ASCII 1963

Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(10) ASCII 1965

ASCII 1965

Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(11) ASCII 1967

ASCII 1967

Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(12) Beyond ASCII



Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(13) ISO 8859



Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(14) ISO 8859-1 (Latin-1) & ISO 8859-2 (Latin-2)

ISO 8859-1 (Latin-1)

Latin-1 (Western European)

ISO 8859-2 (Latin-2)

Latin-2 (Central European)



Character Sets E. Wilde: Internationalization (I18N) & Localization (L10N)

(15) ISO 8859-7 (Greek) & ISO 8859-15 (Latin-9)

ISO 8859-7 (Greek)

Greek

ISO 8859-15 (Latin-9)

Latin-9



Unicode Basics

Outline (Unicode Basics)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(17) ISO 8859 Problems



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(18) Unicode



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(19) Unicode Character Count



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(20) Unicode Encodings

AאU+233B4.gif
Code pointU+0041U+05D0U+597DU+233B4
UTF-841D7 90E5 A5 BDF0 A3 8E B4
UTF-1600 4105 D059 7DD8 4C DF B4
UTF-3200 00 00 4100 00 05 D000 00 59 7D00 02 33 B4


Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(21) UTF-8



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(22) Other UTFs



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(23) Character Set Identification



Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(24) Unicode is Complex

<français>français ≠ français</français>
<français>français ≠ français</français>


Unicode Basics E. Wilde: Internationalization (I18N) & Localization (L10N)

(25) Transcoding



Internationalization (I18N)

Outline (Internationalization (I18N))

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Internationalization (I18N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(27) What is Language?



Internationalization (I18N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(28) Beyond Language

"Mongol" in Uighur Script

Internationalization (I18N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(29) Definition

Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.


Internationalization (I18N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(30) I18N Tasks

  1. UI elements (windows, menus) must be modified to accept translated text
  2. Static text must be made configurable
  3. Icons and graphics must be changed to be more culturally appropriate
  4. Sound files that contain spoken language must be re-recorded
  5. Online help must be translated
  6. Dynamic text (dates, times) must be formatted using the locale
  7. Text handling code must calculate word breaks using the locale
  8. Tabular data must be sortable using the locale


Localization (L10N)

Outline (Localization (L10N))

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Localization (L10N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(32) Definition

Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).


Localization (L10N) E. Wilde: Internationalization (I18N) & Localization (L10N)

(33) L10N Tasks

  1. Create translations for all interface elements
  2. Translate all static texts
  3. If necessary, create localized icons and graphics
  4. Any spoken text must be recorded in the target language
  5. Make sure that the localized product uses the localized online help
  6. Formatting of data types must be treated locale-specific
  7. If necessary, dictionaries and other language tools must be integrated
  8. Sorting functions in the code must respect the locale


Language Identification in Resources

Outline (Language Identification in Resources)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Language Identification in Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(35) Language Codes



Language Identification in Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(36) ISO 639-2 Code List

dum|||Dutch, Middle (ca.1050-1350)|néerlandais moyen (ca. 1050-1350)
dut|nld|nl|Dutch; Flemish|néerlandais; flamand
dyu|||Dyula|dioula
dzo||dz|Dzongkha|dzongkha
efi|||Efik|efik
egy|||Egyptian (Ancient)|égyptien
eka|||Ekajuk|ekajuk
elx|||Elamite|élamite
eng||en|English|anglais
enm|||English, Middle (1100-1500)|anglais moyen (1100-1500)
epo||eo|Esperanto|espéranto
est||et|Estonian|estonien
ewe||ee|Ewe|éwé
ewo|||Ewondo|éwondo
fan|||Fang|fang
fao||fo|Faroese|féroïen
fat|||Fanti|fanti
fij||fj|Fijian|fidjien
fil|||Filipino; Pilipino|filipino; pilipino
fin||fi|Finnish|finnois
fiu|||Finno-Ugrian (Other)|finno-ougriennes, autres langues
fon|||Fon|fon


Language Identification in Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(37) IANA Language Subtag Registry

%%
Type: region
Subtag: UA
Description: Ukraine
Added: 2005-10-16
%%
Type: region
Subtag: UG
Description: Uganda
Added: 2005-10-16
%%
Type: region
Subtag: UM
Description: United States Minor Outlying Islands
Added: 2005-10-16
%%
Type: region
Subtag: US
Description: United States
Added: 2005-10-16
%%


URIs for Multilingual Resources

Outline (URIs for Multilingual Resources)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(39) Naming Language Variants



URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(40) Variant Naming Variations



URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(41) DNS Domains

http://en.example.com/some/page


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(42) Constructed Paths

http://example.com/en/some/page


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(43) Query Strings

http://example.com/some/page?lang=en


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(44) DNS TLDs

http://example.us/some/page


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(46) Content Negotiation

http://example.com/some/page


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(47) Path Segment Name

http://example.com/some/page.en


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(48) URI Sub-Delimiter Comma

http://example.com/some/page,en


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(49) URI Sub-Delimiter Semicolon

http://example.com/some/page;lang=en


URIs for Multilingual Resources E. Wilde: Internationalization (I18N) & Localization (L10N)

(50) Now What?



Conclusions

Outline (Conclusions)

  1. Characters [3]
  2. Character Sets [8]
  3. Unicode Basics [9]
  4. Internationalization (I18N) [4]
  5. Localization (L10N) [2]
  6. Language Identification in Resources [3]
  7. URIs for Multilingual Resources [12]
  8. Conclusions [1]
Conclusions E. Wilde: Internationalization (I18N) & Localization (L10N)

(52) Babelification



2009-12-01 Web Architecture [./]
Fall 2009 — INFO 290 (CCN 42593)