From Legacy Documents to XML: A Conversion Framework

Jean-Pierre Chanod, Boris Chidlovskii, Hervé Déjean, Olivier Fambon, Jérôme Fuselier, Thierry Jacquin, Jean-Luc Meunier

Citation
Descriptions
Abstract:

We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project.

Resources

Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index


Generated by sharef2html on 2011-04-15, 02:00:41.