An Introduction to the Trove API

Are you tired of downloading items one by one after doing a successful search in Trove? Do you want to get an overview of what the entire search looks like? Do you want to connect items which are stored in Trove with like items from another archive?

The Trove API (Application Programming Interface) helps you to do these things and more. People who can program can program can use this in an imaginative way and explore its limits. Yet despite its technical sounding name this is also a tool which people who have no programming skills can use. This introduction to the Trove API is written for those with no background in programming. It takes

Constructing a Trove API Call

With any new technology the first thing you need to do is experiment with it. Tim Sherratt has created a Trove API Console which makes it easy for you to do this without having an API key.

Sherratt has written three API calls for you to use in the Trove API Console. When you click on one you receive a page of information from the Trove database in XML format. This is really useful because it is in a structured format which you can help you to store the information, sort it, search and do all kinds of things. We will come back to the output later. Let’s look at the first of the queries Sherratt has set up on the Trove API page:

Sherratt Call

Try constructing your own Trove API call. Make sure you include the three elements required in a Trove API call:

  1. The root of the API call;
  2. Your search terms: try experimenting using the words ‘AND’, ‘NOT’, ‘OR’;
  3. The zone of Trove you wish to search: Trove is far more than a repository of newspaper articles. Try a search of another zone such as picture, book, music, map, collection, list.

If you want to do anything more sophisticated you will need to apply for a Trove API Key. For very good reasons organisations do not let everybody stroll into their data warehouse to play with it as they please. Just like any business holding items in a physical warehouse, websites have rules which all users of an API must comply with. For security reasons APIs are locked but some, such as the Trove API, allow members of the public to hold keys. All visits are logged. If someone starts messing around or using data in an unauthorised way, the organisation can stop the key from working.

To apply for a Trove API key login to Trove and go to the developer tab in your user profile. Your user profile can be accessed via the ‘View user profile’ button in the top right hand corner of your Trove screen.

To create an API call using your key you will first need to construct a search query again, but this time it is a little bit more complicated. I want to do a simple search for newspaper articles with the words “secular” and “education”. This is the API call I constructed:CaptureThis API call looks different to the one you used in the Trove API Console for two reasons:

  1. We now have four parts to our API call with the inclusion of the API Key. These are confidential. Substitute <INSERT KEY> with your own Trove API key.
  2. The API call is effectively a URL. URLs can only include ASCII characters. A space is not an ASCII character. Using the URL Encoding Reference list we find that the code for a space is ‘%20’ so we replace all spaces in our API call with this code.

Now place your API call works into your browser and see if it works.

This API call gets us results but are they the correct results? It is a good idea to test your API calls by putting the same search terms in the Trove advanced search page and checking that they are the same. Doing this I find that the Trove Advanced Search query returns 62,744 results. Looking at the fourth line of the results returned through the API call we find the following code:

<records s="0" n="20" total="62744" next="/result?q=secular+education&zone=newspaper&s=20">

This tells us that the total number of results is 62,744. It looks like this call has worked.

But we have received only 20 out of the 62,744 results. To download more we need to change the ‘n’ number. The maximum number of results the Trove API allows in one call is 100. This is because the National Library needs to conserve their bandwidth so that Trove works well for all users. They can’t have a handful of users taking up computing resources with large queries leaving the majority of people who use the Trove website for searches frustrated at how slow the service has become.

The ‘s’ value is the result number you wish to start your download from. The first result (or article) has a value of zero. In our example if you want to receive the next 100 results you need to set an ‘s’ value of 100. If you want the following 100 results the ‘s’ value needs to be 200 etc. The same principle follows if you are using different ‘n’ values. So to receive the second group of results where the ‘n’ value of the first group is 40, you need to set an ‘s’ value of 40. To receive the third group the ‘s’ value needs to be 80 etc.

So in our example where n=100 and we want to see the second group of results, we need to construct another API call:

Amended Call s100To receive the third page you need to change this call to s=200 and run it again. To download all 62,744 results where n=100 you need to run 628 API calls, increasing s by the value of the previous ‘s’ plus the ‘n’ value each time. This is where a program would be useful.

The Results

Let’s look closer at the results. You will find that each result returned looks like this:

<article id="136223996" url="/newspaper/136223996">

<heading>SECULAR EDUCATION.</heading>

<category>Article</category>

<title id="356">Newcastle Morning Herald & Miners' Advocate (NSW : 1876 - 1954) </title>

<date>1906-05-22</date>

<page>5</page>

<pageSequence>5</pageSequence>

<relevance score="9.819319">very relevant</relevance>

<snippet>... <strong>SECULAR</strong> <strong>EDUCATION.</strong> LONDON, Monday.-The Yorkshire Liberal Federation, by a large majority, has passed a resolution in favour of <strong>secular</strong> <strong>education.</strong>                     ...</snippet>

<troveUrl>http://trove.nla.gov.au/ndp/del/article/136223996?searchTerm=secular+education</troveUrl&gt;

</article>

There are some terms to explain:

  • Each article has a unique ‘article id’ number which can be used to construct a citeable URL;
  • ‘title id’ is the unique number given by Trove to each publication.
  • ‘pageSequence’ is often the same as page number but it may have a letter after it indicating that the page is part of a special section or insert in the newspaper
  • The ‘relevance score’ is the number Trove uses to sort the results. The results are sorted by Trove in order of relevance starting with the most relevant article. The calculation of this number is explained here.
  • ‘snippet’ is the brief excerpt from the article that is returned below each search result.

We always want more. The most obvious thing that is missing is the complete text of the articles. Our API call now looks like this:

Amended Call s100 articleTextScroll down the results and you will see that there is a new tag in the results returned, <articleText> which includes the full text of the article.

Adding Some More Precision to Searches

Some words contain another word. The word ‘schooling’ contains the word ‘school’. When you are searching for a particular word Trove returns results where an article includes a word which contains the word you are looking for. In this example a search in Trove using the word ‘school’ will also include articles with the word ‘schooling’ even if the article does not contain the word ‘school’. This is also the case for plural words such as schools.

Sometimes you want to refine your results so they only include the exact word that you are searching.  In Advanced Search this problem can be addressed through the use of ‘fulltext’. So in the search box we need to write:

Secular fulltext:school

This reduces our results by nearly ten thousand articles, but these are more relevant articles for us.

We can also limit our articles in this way through an API call. However, first we need to look up the URL Encoding Reference List to find the code for the colon. It is ‘%3A’. So our API call now looks like this:

Amended Call s100 URL EncodingScroll down through the results and you will find the full text of the articles.

There are many refinements that you can introduce to your searches through the API, just as you do in your searches via the Advanced Search page. Full documentation of the API is provided on the Trove website.

Receiving your results in your browser window is nice, but you can do some more, once again without doing any programming. In the next post I will explain how you can use Excel to import your search results into a spreadsheet using the Trove API.

References

This is part two of a series about the basics of using the Trove API. The other posts are:

2. Using the Trove API with Excel Spreadsheets

3. A Closer Look at Newspapers via the Trove API

This explanation of the ‘s’ value in this post has been corrected and expanded upon in response to a comment from Travis M Sellars on the next post, Using the Trove API with Excel Spreadsheets.

About these ads

2 thoughts on “An Introduction to the Trove API

  1. Yvonne, I read your pieces with great relief, as I have been attempting to work out how to gather a lot material out of the Trove resource and had taken to spending a couple hours just hand-gathering. What I can’t work out is how to get the jpg or the pdf files to accompany the lists. I want to use the data for a number of ideas I have, one being the concern amongst farmers for information about such events as the season break or the opening rains-this being for South Australia and Victoria-about harvest yields, with a view to trying to work out how farmers arrived at these concepts and used them. (Where I live the farmers now claim that the springs begin to run about 3 weeks before the opening rains). I am also involved in a project about a soldier settlement scheme in SA and want to gather all the newspaper and journal articles in Trove about the scheme, with jpeg or pdf images of the articles. Is it possible?

    • That is an interesting project you have embarked on Kevin. Unfortunately I am travelling at the moment and much as I would love to explore your question I don’t have the time to fully answer it now. I usually use the articletext field to download the full text of articles. This gives the text of the article in machine readable form which makes it more useful for further analysis than a pdf or jpg. I had a quick look at the documentation for the API and found that you can download the entire page that an article is on via the API.

      Keep in mind that you can ask questions about how to use the API on the Trove forums. I have found that the response is good.

      All the best for your research! I am interested to hear how you go. It would be great if you could set up a blog and write an occasional post about how you are going.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s