Definition: Unicode [Web and XML Glossary]

Unicode

Unicode defines a 31-bit character set. Unicode is closely aligned with UCS. The most commonly used characters, including all those found in older encoding standards, have been placed in one of the first 65534 positions (0x0000 to 0xFFFD). This 16-bit subset is called the BMP or "Plane 0". The characters that were later added outside the 16-bit BMP are mostly for specialist applications such as historic scripts and scientific notation. New characters are still being added on a continuous basis, but the existing characters will not be changed any more and are stable. Unicode assigns to each character not only a code number but also an official name. A hexadecimal number that represents a Unicode or UCS value is commonly preceded by "U+" as in U+0041 for the character "Latin capital letter A". The Unicode characters U+0000 to U+007F are identical to those in ASCII, and the range U+0000 to U+00FF is identical to ISO 8859-1.

Type Associations

Topic(s) from which this Topic is derived:
- CCS (Coded Character Set)

Associations

Unicode contains
- BOM · NFC · NFD · NFKC · NFKD · UCD
Unicode is based on
- UCS
Unicode is used as a base by
- URI · XML · YAML
Unicode is informatively described at

Mentioned in...

CRVX · Character Set · OpenType · UTF-32

Bibliographic References

Additional Information

Topic Creation: 2000-06-07, Modification Date: 2003-01-23; HTML Creation: 2012-01-22, 07:00:09
Comments? Corrections? Updates? Please send Email!