Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach

Xia Yang, Mong-Li Lee, Tok Wang Ling


While the Internet has facilitated access to information sources, the task of scalable integration of these heterogeneous data sources remains a challenge. The adoption of the Extensible Markup Language (XML) as the standard for data representation and exchange has led to an increasing number of XML data sources, both native and non-native. Recent integration work has mainly focused on developing matching techniques to find equivalent elements and attributes among the different XML sources. In this paper, we introduce a semantic approach to resolve structural conflicts in the integration of XML schemas. We employ a data model called the ORA-SS (Object-Relationship-Attribute Model for Semi-Structured Data) to capture the implicit semantics in an XML schema. We present a comprehensive algorithm to integrate XML schemas. Compared to existing methods, our algorithm adopts an n-nary integration strategy that takes into account the data semantics, importance of a source, and how the majority of the sources model their data when resolving structural conflicts such as attribute/object class conflict and ancestor-descendant conflict. Further, redundant object classes and transitive relationship sets are removed to obtain a more concise integrated schema.


