XML and Unicode Normalization

Cliff Schmidt


Several XML-based standards mandate the normalization of various aspects of XML and XML-based technologies. Normalization allows for compatibility between unfamiliar systems, which is necessary to fulfill scenarios that view the Web as a single, large application. Component systems within a Web application might benefit from relying on an early normalization process, which allows them to perform operations such as collation and string-matching without having to consider multiple potential forms of the incoming data. However, universal mandatory normalization can also restrict the flexibility for systems engaged in a private contract to efficiently use XML and XML-based technologies. Web services are probably the most prevalent example of such systems. The early normalization process might require a system to perform various additional encodings and decodings simply for the ability to use XML as a transport between system components, especially if they natively use a normalization form different from the mandated one. As current XML standards continue to evolve and new standards develop, the issue of mandatory normalization will continue to require the XML community to carefully consider the balance between two important Web scenarios: enabling unfamiliar systems to make certain assumptions about each other's data, without making it impractical for familiar systems to leverage the same standards and technologies. This paper will address these concerns by focusing on the specific issues around the character normalization debate.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.