Getting Started with TEI

For DH week, I attended the ‘Getting Started with TEI’ workshop hosted by Filipa Calado at the Grad Center. TEI, or Text Encoding Initiative is a markup schema for representing the structural, renditional, and conceptual features of texts. For anyone familiar with HTML, the two look similar. However, HTML encodes how a text should appear on a page while TEI encodes the context of a text. Filipa gave a brief introduction to TEI, the guidelines for using it, and then we practiced encoding with pages from the Picture of Dorian Gray manuscript. 

Since TEI has its roots in XML, many of the same rules in XML apply when encoding in TEI, such as  proper nesting. Meaning, you must always close the last tag you opened before closing a tag you opened previous to the last one. Ex: <sentence> <emphasis> </emphasis> </sentence>. Structurally, every TEI document begins with an XML declaration, or DTD (Document Type Definition). This declaration is necessary for a computer to read TEI. TEI documents will consist of two parts, the Head and the Body. The Head describes the source text’s metadata and includes the following elements: <TEI>, <teiHeader>, <fileDesc>, <titleStmt>, <title>, <publicationStmt>, and <sourceDesc>. The Body is the main section of the document and is what you will see when the TEI document is transformed. The Body section begins with <sourceDoc> and will feature such elements as <add> <del> and <line>.

Although it does sound like a lot of work, the end product is definitely worth it. Filipa shared pages from Mary Shelley’s Frankenstein manuscript from the Shelley-Godwin Archive. Using TEI, the project team was able to decipher Mary Shelley’s draft, while including Percy Shelley’s revisions in red. 

Before starting an encoding project, Filipia advises that it is important to think about your goals of the project and consider not just the document, but your audience, your research goals, and how you wish to represent textual data. She offered some guiding questions from the Women Writers Project (WWP) for preliminary document and project analysis:

  1. As far as you can tell, how is the document structured?
  2. Are there any kinds of regularization or editorial amendment you will perform as you transcribe the text?
  3. How much information about the appearance of the document do you need to capture?

Keeping these questions in mind, we then evaluated pages from the Picture of Dorian Gray manuscript, and tried to encode pages 20 and 21 in a text editor. We uploaded to a Google Drive and compared our results. I really enjoyed this process, but found myself getting stuck on the crossed out sections, trying to decipher the writing underneath. Something that Filipa recommended to me was to use the ‘<gapreason=’illegible’>’ when encoding. Filipa then showed the section she has been working on to check our work. There were a number of revisions on the page, and Filipa was able to get most of them. Some writing is still illegible and may need to undergo a similar process that palimpsests do to decipher the text.

If anyone is interested in TEI, Filipa shared a number of links, like the DARC wiki on Text Editing, and The Letters of Vincent van Gogh project.