Using XSLT for getting Back-of-the-book Indexes

Jirka Kosek


Many electronic publishing systems built on the top of XML (e.g. DocBook) use XSLT to convert source XML document into target formats like HTML or XSL-FO (for print output). During the transformation back-of-the-book index can be generated and populated by index entries spread over the document. Creating index basically means to sort and group index entries by their first letter. However this solutions is appropriate only for some languages, English included. For other Latin based languages like Czech, Hungarian or Spanish grouping method is more sophisticated and can't be expressed in the standard XSLT 1.0. The task is even more challenging if we want to get internationalized indexes in some general stylesheet package like DocBook XSL stylesheets. These stylesheets should support as many XSLT implementations as possible what disqualifies usage of vendor extensions. This paper will show you how support for non-English index generation was implemented in the DocBook XSL stylesheets, what problems were overcame and what functionality is missing in XSLT 1.0, but can be added using EXSLT extensions. To deal with grouping problems like different accented letters belonging to the same group, multi-letter sequences denoting one group etc. solution based on XSLT keys over user defined function is provided. This function uses external localization files to lookup values which drive index generation and grouping. Method presented up to this point is sufficient for indexes in HTML output. Print output brings new problems. As the transformation and formatting phases in the XSL are separated there is no direct support for merging duplicate page numbers in XSL-FO. Fortunately many FO engine vendors provide custom extensions to deal with this issue. Integration of these extensions into the DocBook XSL stylesheet will be presented. The article also includes evaluation of XSLT 2.0 features available for index generation and proposals for further improvement of indexing method that will be able to handle CJKV languages.


Bibliography Navigation: Reference List; Author Index; Title Index; Keyword Index

Generated by sharef2html on 2011-04-15, 02:00:41.