A TEI-based Publishing Workflow

Pre-conference workshop demonstrating the use of TEI in publishing: Wednesday, 11 November, from 3-6
at the Conference and Members’ Meeting of the TEI Consortium
the University of Michigan in Ann Arbor, USA, November 9-15 2009.

The co-presenters will be:

This workshop will discuss the introduction of TEI into the traditional publishing processes and to highlight benefits derived from adopting these methods. The workshop will included a demonstration of the steps taken from manuscript through the encoding process and into composition using InDesign, with final print PDF and XML as deliverables. Further steps will be discussed, showing how the XML can be used to derive multiple outputs such as XHTML and ePub.

Generally speaking, XML can be inserted into the process prior to composition or directly after. XML first describes the former process, while XML last centers around converting final files, including the conversion of legacy content. We intend to touch on both workflows in the workshop, touching on benefits as well as limitations.

We will assume a wide-range of user experience involving not only TEI but XML as well. We will discuss using XSLT to transform documents, Schematron to check documents, and even regular expressions. Some familiarity with these concepts is good, but even if you are new to XML you will be able to follow along.

XML First

An XML first workflow involves converting manuscript files, which are usually in Microsoft Word format, into XML prior to composition. We will demonstrate a lightweight approach to taking marked up manuscript files and turning them into TEI XML which can then be imported into InDesign. We will discuss two cases involving this workflow: one set inhouse and one outsourced. Referring to Adobe's technical reference on InDesign CS3 and XML, three separate methods for importing XML will be demonstrated:

XML Last and Legacy Content

XML last involves converting the final files into XML. When dealing with compositors, this might involve creating encoding guidelines that are used by the vendors in order to convert the files, whether from layout application files or from PDF files. The composition vendors wait until the book is finalized before converting, and deliver the XML files at a later date.

As integrating backlist with new content is a goal for some presses, we will discuss the process of converting legacy content, whether that content is in digital form or only exists as hard copy. We will discuss a couple of different routes for this process: digital files (generally PDF), which often invovles double keyboarding (or a mix of keyboarding and OCR) vs. do-it-yourself conversion via Adobe Acrobat or similar + transformation.

Quality Checks

We will briefly touch on issues regarding QA checks. These are above and beyond what one would expect from the accuracy of keyboarded content. Validating against a schema like TEI will only take you so far; you can have perfectly well-formed and valid documents that include a lot of structural issues. We will show a QA process involving Schematron that can automate some of these QA checks.

Files for Workshop

The files used in the workshop can be found here. They will be updated as soon as possible.

Applications Used

For specific questions or suggestions please contact Kenneth Reed at kenneth_reed [at] unc [dot] edu