At some stage or another any historian in the twenty-first century will consider embarking on a digitisation project of their own. Back in 2010 I briefly explored the possibility of organising the digitisation of some old school text books that I had been researching as part of my work on the Teaching Reading in Australia project. If I was to organise this I wanted to do it properly and ensure the resulting data could be linked to other similar historical data and be useful for other researchers. I did not want to do another project that merely reproduced pretty pictures of text (pdfs) which were not machine readable.
I was quickly confronted by the sad fact that my ambitions exceeded my skills. From attending THATCamps, reading blogs and following digital humanists on Twitter I knew that I should encode the data in XML using the framework provided by the Text Encoding Initiative (TEI), but I didn’t know how to do that. I don’t like doing something unless I do it properly, and I always have too much to do, so I dropped the idea.
Like all historians I have transcribed many hand-written documents from photos of primary sources I have taken for research purposes in the archives. Each document is idiosyncratic. The relevant items on a page are not restricted to words. There are underlines, crossed out words (who did the crossing out?), notes scribbled in the margins by the original author at a later date or someone else. There are arrows, drawings or diagrams. Too often the writing may be illegible. Each of these important bits of information needs to be recorded in the transcription. Quite often I will use markup borrowed from html or make up my own methods to signal a type of message in a transcription.
Since then I have been fascinated by a project of Dr Melodee Beals who is a Senior Lecturer in History at Sheffield Hallam University. Beals is marking up her transcriptions of historic documents in TEI. Separating the design from the text is a fundamental principle of web design. TEI enables us to prepare the transcription in a way that can be easily formatted for display on websites via XSLT. Beals’ project makes so much sense for historians. Why not incorporate some basic TEI markup in our transcriptions from the moment we start transcribing documents?
I needed to learn more about this mysterious TEI.
Fortune smiled and one of the workshops offered at this week’s Global Digital Humanities Conference covered basic TEI. For the last day and a half I have been learning about TEI and manipulation of images in the workshop, ‘Introduction to Digital Manuscript Studies‘ conducted by Elena Pierazzo, Professor of Italian Studies and Digital Humanities at the University of Grenoble 3 ‘Stendhal’, and Peter Stokes, Senior Lecturer in Digital Humanities at Kings College London. (Have a look at the impressive results of Pierazzo’s TEI transcription work on Proust’s notebook).
I now have the kickstart that I need. Last night I worked on marking up a transcription I had done of a document from my own project to reinforce what I had learned. One thing that has been bothering me about some transcriptions available on the internet is the lack of consistency with date formatting. There are many ways we can write dates and authors of handwritten documents use all sorts of approaches. Last night I discovered ‘13 Names, Dates, People, and Places’. This is the TEI chapter for me! I discovered how to encode a consistent, searchable date format while preserving the idiosyncratic way it was recorded in the original document. Oh, the potential of this! Continue reading