Creating a TEI file

Author

James Tripp

Aims

  • Become more familiar with the TEI XML file structure

  • Practice using the Oxygen XML editor

  • Start creating a TEI XML file for the Orwell Political Diaries 1940-41 found here

Primary source

The British Library has made photos of the Orwell Political Diaries available to us. There are 4 pages including the 17th, 21st, 24th and 27th of September 1940.

For those unfamiliar with George Orwell, there is an extensive wikipedia page about him. In the early 1940s, he was found unfit for any kind of military service, working in the Home Guard and living in Central London. The period is after he published Coming Up for Air (1939) and several years before his famous book Animal Farm (1945).

Source rights and usage

We are going to work on encoding pages from George Orwell’s political diary 1940-41. Several photographs of the diary are available to view on the British Library website. Note the usage terms from the website.

George Orwell: © With kind permission of the estate of the late Sonia Brownell Orwell. You may not use the material for commercial purposes. Please credit the copyright holder when reusing this work.

UCL: © Orwell Archive, UCL Library Special Collections.

We will stick strictly to these terms and only use the image for educational purposes.

Why?

A digital version of Orwell’s diary is available on a WordPress page. Like other Orwell writings available online these follow a blog type format. What information is lost when comparing images of the diary with the blog format?

TEI XML provides us with a way to encode more information. We can include strike through, edits, indicates places, etc. All of this information may be useful for a richer interpretation and understanding of the diary. A better understanding of the diary helps us understand the time, the people and, I daresay, the humanity of the moment.

Oxygen XML

The instructions below are for opening and creating a TEI document using Oxygen XML.

  1. Open the Oxygen XML program. If you are in room R0.39 then open the start menu type in oxygen. You should see the application.

Windows start menu showing Oxygen XML Editor

  1. Once open, select New Document. When you are faced with the document type select TEI P5 All and then click create.

New document interface with TEI P5 All selected

Oxygen will load in a template document ready for you to use the TEI guidelines.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
  </teiHeader>
  <text>
      <body>
         <p>Some text here.</p>
      </body>
  </text>
</TEI>

XML and TEI

The XML files is a series of tags such as <text>.

This line tells us we are using XML version 1 and the file encoding. This line is needed but it beyond the scope of our session to delve into all the details.

<?xml version="1.0" encoding="UTF-8"?>

Next we specify the rules we want to follow.

<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

We are specifying the particular xml models we want to use. An XML model has a tree structure - that is, a particular <node> may by within other specific <nodes>.

Wait, what? That may sound confusing but let us look at the teiHeader section of our file.

<teiHeader>
    <fileDesc>
       <titleStmt>
          <title>Title</title>
       </titleStmt>
       <publicationStmt>
          <p>Publication Information</p>
       </publicationStmt>
       <sourceDesc>
          <p>Information about the source</p>
       </sourceDesc>
    </fileDesc>
</teiHeader>

As a tree, this section look like this.

Tree representation of node stucture

Within the XML model, a teiHeader can contain several other nodes including fileDesc. In fileDesc, the titleStml, publicationStmt and sourceDesc are allowed. These links will take you to the TEI documentation which is available online. The remainder of the file contains the text node, within which is the body and in that is a paragraph.

The ruleset (guidelines) may seem restrictive, but it provides a standardised way to encode our text. We can also use the guidelines as prompts to consider what we might want to encode.

Helpful Oxygen XML

The Oxygen XML interface is designed to help you.

  • In the bottom left is a tree, much like the screenshot above. If you click on the a node name, such as fileDesc, all of the child nodes are selected (see below image).

    Oxygen XML editor with the fileDesc node and child nodes highlighted

  • In the top right are two tabs: attributes and model. The model tab shows the rules for the selected node. If we select the fileDesc node then we see a description of the node from the TEI guidelines, the possible child nodes (titleStmt, publicationStmt) and the possible attributes. An attribute is information attached to the node.

The model tab for the fileDesc node displayed in the Oxygen XML editor

  • The attributes tab allows you to easily change an attribute of a node. More on that in the second activity.

  • In the Text tab (where the file text is shown) you can create a new node by opening a < bracket. Oxygen XML will then show you a list of the available nodes (keeping within the TEI rules). For instance, you can create a location node as a child node of p (paragraph).

Activity One

Goal

We want to have a version of Orwell’s diary entries in an XML format. In this first activity, we will create a file containing the Political Diary text with file information in the TEI header and structural information (paragraphs and page numbers).

Steps

  1. Create a TEI P5 file (see above). Add to the file the file title, author, publication authority, source description and details of the project. Save the file.
  2. Add the paragraphs and page numbers.
Tip
  • University of Warwick students can access diary page plain text here. You will need to put this text into a TEI XML file in Oxygen XML.

  • The TEI group have information about the TEI header. There is also a tutorial on the teibyexample.org page which details ‘The TEI Header’.

  • Each TEI element has documentation online. Guidance on interpreting the guidance is below. For this activity, you are encoding the position of paragraphs with the p element and page beginnings with the pb element. Note: the page break for page 55 will be <pb n=“55”/>.