Character Sets, Internationalization (I18N), and Localization (L10N)

Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)

Erik Wilde, UC Berkeley School of Information
2009-03-18

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

Contents

E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(2) Abstract

Every character-based document is based on some model of which characters are available, and how they are encoded. Unicode is the most popular character set today and provides a variety of encoding schemes, each of them being a Unicode Transformation Format (UTF). Many publishing environments need to support multiple languages. Internationalization (I18N) is the approach to design systems which can adapt to different locales. Localization (L10N) is the activity to identify, define, and encode locales, based on internationalized software.



Characters and Character Sets

Outline (Characters and Character Sets)

  1. Characters and Character Sets [9]
  2. Unicode [3]
  3. Internationalization (I18N) [5]
  4. Localization (L10N) [2]
Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(4) Characters and Computers



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(5) Characters

Character. (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape […]

The Unicode Standard, Version 4.0, Addison-Wesley, 2003 [http://dret.net/biblio/reference/unicode4]



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(6) Glyphs

[A Glyph is] a recognizable abstract graphic symbol which is independent of a specific design.

ISO/IEC 9541:1991, Information Technology – Font Information Interchange [http://dret.net/biblio/reference/iso9541]



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(7) Character Set Identification



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(8) History of Character Sets



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(9) ASCII 1967

ASCII 1967

Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(10) Beyond ASCII



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(11) ISO 8859



Characters and Character Sets E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(12) ISO 8859-1 (Latin-1) & ISO 8859-15 (Latin-9)

ISO 8859-1 (Latin-1)

Latin-1 (Western European)

ISO 8859-15 (Latin-9)

Latin-9



Unicode

Outline (Unicode)

  1. Characters and Character Sets [9]
  2. Unicode [3]
  3. Internationalization (I18N) [5]
  4. Localization (L10N) [2]
Unicode E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(14) ISO 8859 Problems



Unicode E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(15) Unicode



Unicode E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(16) Unicode Character Count



Internationalization (I18N)Internationalization (I18N)

Outline (Internationalization (I18N))

  1. Characters and Character Sets [9]
  2. Unicode [3]
  3. Internationalization (I18N) [5]
  4. Localization (L10N) [2]
Internationalization (I18N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(18) What is Language?



Internationalization (I18N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(19) Beyond Language

"Mongol" in Uighur Script

Internationalization (I18N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(20) Directionality and Screen Layout

Right-to-Left Layout for Outlook Web Access

Internationalization (I18N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(21) Definition

Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.


Internationalization (I18N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(22) I18N Tasks

  1. UI elements (windows, menus) must be modified to accept translated text
  2. Static text must be made configurable
  3. Icons and graphics must be changed to be more culturally appropriate
  4. Sound files that contain spoken language must be re-recorded
  5. Online help must be translated
  6. Dynamic text (dates, times) must be formatted using the locale
  7. Text handling code must calculate word breaks using the locale
  8. Tabular data must be sortable using the locale


Localization (L10N)

Outline (Localization (L10N))

  1. Characters and Character Sets [9]
  2. Unicode [3]
  3. Internationalization (I18N) [5]
  4. Localization (L10N) [2]
Localization (L10N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(24) Definition

Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).


Localization (L10N) E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(25) L10N Tasks

  1. Create translations for all interface elements
  2. Translate all static texts
  3. If necessary, create localized icons and graphics
  4. Any spoken text must be recorded in the target language
  5. Make sure that the localized product uses the localized online help
  6. Formatting of data types must be treated locale-specific
  7. If necessary, dictionaries and other language tools must be integrated
  8. Sorting functions in the code must respect the locale


E. Wilde: Character Sets, Internationalization (I18N), and Localization (L10N)

(26) Conclusions



2009-03-18 Web Architecture and Information Management [./]
Spring 2009 — INFO 190-02 (CCN 42509)