The Needle in the Haystack

Photo by Julie-Ann Robson.

Photo by Julie-Ann Robson.

Mining Big Data is very useful for producing longue-durée history as Jo Guldi and David Armitage note in their book, The History Manifesto (read my post on this aspect of their book). However Big Data mining techniques have been equally productive for recovering the history of the millions of people in the past whose lives have before now, been largely hidden from view. We can home in on a chance comment about one person who does not appear in any archival indexes and using a multiplicity of sources build up a fuller impression of their life. Alternatively we can collect the hundreds of scraps of evidence and gain a greater sense of communities which have been previously ignored.

Over one million men from the Indian subcontinent were involved in World War I, whether as soldiers or in support roles, yet their role has rarely featured in the histories that we have told of this War. This is large-scale history that has been ignored until recently.

Over the last few months I have become interested in the interaction between Indians and Australians in World War I. Alerted to the participation of Indians in the War by Australian historian Peter Stanley, I had another look at the World War I diaries I am working with and found numerous references to them by Australian soldiers. Peter Stanley has recently published a book about the Indians at Gallipoli, Die in Battle, Do Not Despair: the Indians on Gallipoli which I am looking forward to reading. I have written about Indian soldiers at Gallipoli on my Stumbling in the Past blog in a post titled, ‘Indian Soldiers Fought at Gallipoli‘.

‘The Needle in the Haystack’: My #OzHA2015 Paper

We are in a time of extraordinary change in research practices in history and are only beginning to discover the potential of even the most simple technical tools. At last week’s Australian Historical Association Conference in Sydney, I delivered a paper where I discussed the methodology that enabled me to find previously uncovered references to Indian soldiers at Gallipoli.

I would not have been able to find out about Indians from these diaries or make the gains about the research on the beliefs of Australian soldiers on the front line, without the generosity of the digital humanities community. Everything that I have learned about using technology in history I have learned from digital humanists on Twitter, writing in blogs, THATCamps and most recently the Global Digital Humanities Conference in Sydney.

In the spirit of giving back, I have attached the paper I delivered at the conference at the bottom of this post. It is not exactly the same as the paper I delivered because I ran out of time so cut some of the comments towards the end of the paper. I have added hyperlinks, lots of references and some of the images I included in the slides which accompanied my talk. Continue reading

#DH2015 – Introduction to Digital Manuscript Studies Workshop

Sign saying 'Building EA/DH2015/Global Digital/Humanities' with an arrow behind the writing.At some stage or another any historian in the twenty-first century will consider embarking on a digitisation project of their own. Back in 2010 I briefly explored the possibility of organising the digitisation of some old school text books that I had been researching as part of my work on the Teaching Reading in Australia project. If I was to organise this I wanted to do it properly and ensure the resulting data could be linked to other similar historical data and be useful for other researchers. I did not want to do another project that merely reproduced pretty pictures of text (pdfs) which were not machine readable.

I was quickly confronted by the sad fact that my ambitions exceeded my skills. From attending THATCamps, reading blogs and following digital humanists on Twitter I knew that I should encode the data in XML using the framework provided by the Text Encoding Initiative (TEI), but I didn’t know how to do that. I don’t like doing something unless I do it properly, and I always have too much to do, so I dropped the idea.

Like all historians I have transcribed many hand-written documents from photos of primary sources I have taken for research purposes in the archives. Each document is idiosyncratic. The relevant items on a page are not restricted to words. There are underlines, crossed out words (who did the crossing out?), notes scribbled in the margins by the original author at a later date or someone else. There are arrows, drawings or diagrams. Too often the writing may be illegible. Each of these important bits of information needs to be recorded in the transcription. Quite often I will use markup borrowed from html or make up my own methods to signal a type of message in a transcription.

Since then I have been fascinated by a project of Dr Melodee Beals who is a Senior Lecturer in History at Sheffield Hallam University. Beals is marking up her transcriptions of historic documents in TEI. Separating the design from the text is a fundamental principle of web design. TEI enables us to prepare the transcription in a way that can be easily formatted for display on websites via XSLT. Beals’ project makes so much sense for historians. Why not incorporate some basic TEI markup in our transcriptions from the moment we start transcribing documents?

I needed to learn more about this mysterious TEI.

Fortune smiled and one of the workshops offered at this week’s Global Digital Humanities Conference covered basic TEI. For the last day and a half I have been learning about TEI and manipulation of images in the workshop, ‘Introduction to Digital Manuscript Studies‘ conducted by Elena Pierazzo, Professor of Italian Studies and Digital Humanities at the University of Grenoble 3 ‘Stendhal’, and Peter Stokes, Senior Lecturer in Digital Humanities at Kings College London. (Have a look at the impressive results of Pierazzo’s TEI transcription work on Proust’s notebook).

I now have the kickstart that I need. Last night I worked on marking up a transcription I had done of a document from my own project to reinforce what I had learned. One thing that has been bothering me about some transcriptions available on the internet is the lack of consistency with date formatting. There are many ways we can write dates and authors of handwritten documents use all sorts of approaches. Last night I discovered ‘13 Names, Dates, People, and Places’. This is the TEI chapter for me! I discovered how to encode a consistent, searchable date format while preserving the idiosyncratic way it was recorded in the original document. Oh, the potential of this! Continue reading

The History Manifesto and Big Data

I published this review of the ‘Big Data’ chapter in The History Manifesto written by historians, Jo Guldi and David Armitage on my history blog last month. It is now on the reading list of HIST4170, Exploring Digital Humanities, a course offered by the history department at the University of Guelph, Canada.

Stumbling Through the Past

Book cover of The History Manifesto The History Manifesto by Jo Guldi and David Armitage, (Cambridge University Press, 2014).

In my last post I reviewed the provocative book, The History Manifesto. Written by history academics Jo Guldi (Brown University) and David Armitage (Harvard), it is a call to historians to turn their work towards investigating long periods of history (the longue-durée) in order to address the big issues affecting humanity such as inequality and climate change. I set aside one chapter in that review for special attention. In this post I consider chapter four, ‘Big questions, big data’.

There are many ways that technology can be used by the historian The ‘Big Data’ chapter in The History Manifesto discusses the use of topic modelling tools to highlight the type of language most often used and the topics most widely discussed in the past. Guldi and Armitage also recognise the potential for digital tools to uncover…

View original post 1,972 more words

A Closer Look at Newspaper Items via the Trove API

A simple search is quick and easy using the Trove API but these are often too broad to provide the information we need. Any insights to be made are lost in the sludge of irrelevant results. In this post I will explain how to narrow the search down so that you can close in on the data that is more helpful for your research. If you have not used the Trove API before and need to learn how to do a simple search using this tool you should first read ‘An Introduction to the Trove API’.

Trove is a huge database of information contributed by over two thousand libraries in Australia as well as other organisations. It is an ever increasing data mine. Today the Trove website says that it holds 389,961,760 items. The Trove API gives access to items in several zones: book, picture, article, music, map, collection, newspaper, list. This series of posts focuses on the article zone which allows access to digitised newspapers.

This post is designed to be used as a reference when you need to find answers for particular types of searches that you are most likely to be conducting. Keep in mind that there is a lot more that you can do in the newspaper zone via the Trove API, and that there is also a large amount of data to explore in the other zones. If the answer to your question is not here then you should consult the Trove API Technical Guide. The other essential document you will need to consult regularly is the URL Encoding Reference so you can translate non-ASCII characters into a code that will be recognised on the Web. Continue reading

Using the Trove API with Excel Spreadsheets

In the last post I explained the basics of using the Trove API by importing the results into the browser window. A more powerful way to use the results returned from the API is to import the results into an Excel spreadsheet. Excel has its limitations but it is one way that people who don’t have any programming skills can store and analyse results gained through the API.

There are two methods of importing the data into Excel:

  1. In a blank workbook in Excel go to the Data tab. Click on the ‘From Web’ button in the ‘Get External Data’ section of the ribbon on the left hand side. Copy and paste your API call into the ‘Address’ window and click ‘Go’. You will see the same results that you saw in your browser window. Click the ‘Import’ button at the bottom of the window. A message will dome up saying “[t]he specified XML source does not refer to a schema…” Click ‘OK’. Specify which cell your table should start from and the data will load into your spreadsheet. Continue reading

An Introduction to the Trove API

Are you tired of downloading items one by one after doing a successful search in Trove? Do you want to get an overview of what the entire search looks like? Do you want to connect items which are stored in Trove with like items from another archive?

The Trove API (Application Programming Interface) helps you to do these things and more. People who can program can program can use this in an imaginative way and explore its limits. Yet despite its technical sounding name this is also a tool which people who have no programming skills can use. This introduction to the Trove API is written for those with no background in programming. It takes Continue reading

Rails Girls Canberra

Hard at wok - Rails Girls Canberra.

Hard at work – Rails Girls Canberra.

Women were programming pioneers. Ada Lovelace is recognised as the world’s first programmer.  In the nineteenth century she wrote the instructions which could activate Charles Babbage’s Analytical Engine.  Her feat is recognised today through the Ada Lovelace Day which falls on 15th October in 2013.

It was not until the mid-twentieth century that computing really took off.  The ENIAC computer developed during WWII is recognised as the first electronic computer.  The entire ENIAC programming team were women. Grace Hopper wrote the first automatic compiler and led the development of the COBOL programming language which went on to become one of the world’s most popular business programming languages.

Yet the proportion of women programming has declined over the last couple of decades.  Today women are only a small proportion of  software developers.  There has been much wringing of hands about this, but wringing of hands does not achieve anything other than warm up our hands.

One development community is taking action to increase the numbers of women in website development.  I was fortunate enough to be able to participate in one of their events earlier this month.

Ruby on Rails is a popular development framework that is used to create websites.  The Rails community has supported a training program for women called Rails for Girls.  It started in Finland and is now well established in many countries including Australia.

I was fortunate enough to be able to participate in one of their introductory workshops in Canberra recently. Continue reading