Structure-Preserving Difference Search for XML Documents

Erich Schubert, Sebastian Schaffert, François Bry


Current XML differencing applications usually try to find a minimal sequence of edit operations that transform one XML document to another XML document (the so-called "edit script"). In our conviction, this approach often produces increments that are unintuitive for human readers and do not reflect the actual changes. We therefore propose in this article a different approach trying to maximize the retained structure instead of minimizing the edit sequence. Structure is thereby not limited to the usual tree structure of XML — any kind of structural relations can be considered (like parent-child, ancestor-descendant, sibling, document order). In our opinion, this approach is very flexible and able to adapt to the user's requirements. It produces more readable results while still retaining a reasonably small edit sequence.


