[http://creativecommons.org/licenses/by/3.0/]
This work is licensed under a CC
Attribution 3.0 Unported License [http://creativecommons.org/licenses/by/3.0/]
Many publishing environments need to support multiple languages. In many cases, the requirement to support multiple languages surfaces in later stages of a product development or publishing solution, which can cause major design changes, driving up costs. Internationalization (I18N) is the approach to design systems which can adapt to different locales. Localization (L10N) is the activity to identify, define, and encode locales, based on internationalized software. For languages using different alphabets, Unicode is the most popular character set today and provides a variety of encoding schemes, each of them being a Unicode Transformation Format (UTF).
atoms
language atoms
Character. (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape […]
The Unicode Standard, Version 4.0, Addison-Wesley, 2003 [http://dret.net/biblio/reference/unicode4]
[A Glyph is] a recognizable abstract graphic symbol which is independent of a specific design.
ISO/IEC 9541:1991, Information Technology – Font Information Interchange [http://dret.net/biblio/reference/iso9541]
I think there is a world market for maybe five computers.(¬ T. J. Watson [http://en.wikipedia.org/wiki/Thomas_J._Watson#Famous_misquote])
It also shows the Euro sign € which is part of ISO 8859-15 (Latin-9), but not included in ISO 8859-1 (Latin-1).
It also shows the Euro sign ¤ which is part of ISO 8859-15 (Latin-9), but not included in ISO 8859-1 (Latin-1).
Latin-1 (Western European) |
Latin-2 (Central European) |
Greek |
Latin-9 |
In Latin-9, Latin-1's currency symbol ¤ has been replaced with the Euro sign €.
U+0041
)XML is ASCII for the 21st century [../xml-fall09/basics#(6)]
planesof 216 = 65'536 characters
0to
16
Old Italic [http://unicode.org/charts/PDF/U10300.pdf],
Deseret [http://unicode.org/charts/PDF/U10400.pdf],
Byzantine Musical Symbols [http://unicode.org/charts/PDF/U1D000.pdf]
astral planesis empty
A | א | 好 | ||
---|---|---|---|---|
Code point | U+0041 | U+05D0 | U+597D | U+233B4 |
UTF-8 | 41 | D7 90 | E5 A5 BD | F0 A3 8E B4 |
UTF-16 | 00 41 | 05 D0 | 59 7D | D8 4C DF B4 |
UTF-32 | 00 00 00 41 | 00 00 05 D0 | 00 00 59 7D | 00 02 33 B4 |
Content-Type
header field [../services-fall06/web1#(17)]Content-Type: text/html; charset=utf-8
<?xml version="1.0" encoding="utf-8"?>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
regularcharacters
<français>français ≠ français</français>
<français>français â français</français>
one language fits allassumption is becoming increasingly inappropriate
just switch the labelsstrategy also may be too little for true L10N
Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.
Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (alocale).
Tags for Identifying Languages
en
are interpreted according to ISO 639-1 [http://dret.net/biblio/reference/iso639-1]eng
are interpreted according to ISO 639-2 [http://dret.net/biblio/reference/iso639-2]x-
indicates a value which is not standardized (requires mutual agreement)en-US
specify additional properties (regions, dialects, scripts, …)Matching of Language Tags
en-US
or de
are requested but available variants are en
and de
dum|||Dutch, Middle (ca.1050-1350)|néerlandais moyen (ca. 1050-1350) dut|nld|nl|Dutch; Flemish|néerlandais; flamand dyu|||Dyula|dioula dzo||dz|Dzongkha|dzongkha efi|||Efik|efik egy|||Egyptian (Ancient)|égyptien eka|||Ekajuk|ekajuk elx|||Elamite|élamite eng||en|English|anglais enm|||English, Middle (1100-1500)|anglais moyen (1100-1500) epo||eo|Esperanto|espéranto est||et|Estonian|estonien ewe||ee|Ewe|éwé ewo|||Ewondo|éwondo fan|||Fang|fang fao||fo|Faroese|féroïen fat|||Fanti|fanti fij||fj|Fijian|fidjien fil|||Filipino; Pilipino|filipino; pilipino fin||fi|Finnish|finnois fiu|||Finno-Ugrian (Other)|finno-ougriennes, autres langues fon|||Fon|fon
%% Type: region Subtag: UA Description: Ukraine Added: 2005-10-16 %% Type: region Subtag: UG Description: Uganda Added: 2005-10-16 %% Type: region Subtag: UM Description: United States Minor Outlying Islands Added: 2005-10-16 %% Type: region Subtag: US Description: United States Added: 2005-10-16 %%
language-independent resource
http://en.example.com/some/page
URI navigation
http://example.com/en/some/page
URI navigation
http://example.com/some/page?lang=en
http://example.com/some/page
is usable for the abstract resourcehttp://example.us/some/page
http://example.com/some/page
http://example.com/some/page
http://example.com/some/page.en
.
which uses the resource's extension
URI navigation
http://example.com/some/page,en
,
for specifying a parameter to a URI path segmenthttp://example.com/some/page
is usable for the abstract resourcehttp://example.com/some/page;lang=en
;
for specifying a parameter to a URI path segmenthttp://example.com/some/page
is usable for the abstract resourcevariant preferencein a URI
.
, ;
, and ,
) are very similar.
, ;
, and ,
are not treated in any special way.
and ..
http
URIs, ,
and/or ;
could interact with content negotiation,
[URI Sub-Delimiter Comma (1)] or ;
[URI Sub-Delimiter Semicolon (1)] for language variants, but do not expect magic to happen