Are you tired of downloading items one by one after doing a successful search in Trove? Do you want to get an overview of what the entire search looks like? Do you want to connect items which are stored in Trove with like items from another archive?
The Trove API (Application Programming Interface) helps you to do these things and more. People who can program can program can use this in an imaginative way and explore its limits. Yet despite its technical sounding name this is also a tool which people who have no programming skills can use. This introduction to the Trove API is written for those with no background in programming.
Constructing a Trove API Call
With any new technology the first thing you need to do is experiment with it. Tim Sherratt has created a Trove API Console which makes it easy for you to do this without having an API key.
Sherratt has written three API calls for you to use in the Trove API Console. When you click on one you receive a page of information from the Trove database in XML format. This is really useful because it is in a structured format which you can help you to store the information, sort it, search and do all kinds of things. We will come back to the output later. Let’s look at the first of the queries Sherratt has set up on the Trove API page:
Try constructing your own Trove API call. Make sure you include the three elements required in a Trove API call:
- The root of the API call;
- Your search terms: try experimenting using the words ‘AND’, ‘NOT’, ‘OR’;
- The zone of Trove you wish to search: Trove is far more than a repository of newspaper articles. Try a search of another zone such as picture, book, music, map, collection, list.
If you want to do anything more sophisticated you will need to apply for a Trove API Key. For very good reasons organisations do not let everybody stroll into their data warehouse to play with it as they please. Just like any business holding items in a physical warehouse, websites have rules which all users of an API must comply with. For security reasons APIs are locked but some, such as the Trove API, allow members of the public to hold keys. All visits are logged. If someone starts messing around or using data in an unauthorised way, the organisation can stop the key from working.
To apply for a Trove API key login to Trove and go to the developer tab in your user profile. Your user profile can be accessed via the ‘View user profile’ button in the top right hand corner of your Trove screen.
To create an API call using your key you will first need to construct a search query again, but this time it is a little bit more complicated. I want to do a simple search for newspaper articles with the words “secular” and “education”. This is the API call I constructed:This API call looks different to the one you used in the Trove API Console for two reasons:
- We now have four parts to our API call with the inclusion of the API Key. These are confidential. Substitute with your own Trove API key.
- The API call is effectively a URL. URLs can only include ASCII characters. A space is not an ASCII character. Using the URL Encoding Reference list we find that the code for a space is ‘%20’ so we replace all spaces in our API call with this code.
Now place your API call
works into your browser and see if it works.
This API call gets us results but are they the correct results? It is a good idea to test your API calls by putting the same search terms in the Trove advanced search page and checking that they are the same. Doing this I find that the Trove Advanced Search query returns 62,744 results. Looking at the fourth line of the results returned through the API call we find the following code:
records s="0" n="20" total="62744" next="/result?q=secular+education&zone=newspaper&s=20">
This tells us that the total number of results is 62,744. It looks like this call has worked.
But we have received only 20 out of the 62,744 results. To download more we need to change the ‘n’ number. The maximum number of results the Trove API allows in one call is 100. This is because the National Library needs to conserve their bandwidth so that Trove works well for all users. They can’t have a handful of users taking up computing resources with large queries leaving the majority of people who use the Trove website for searches frustrated at how slow the service has become.
The ‘s’ value is the result number you wish to start your download from. The first result (or article) has a value of zero. In our example if you want to receive the next 100 results you need to set an ‘s’ value of 100. If you want the following 100 results the ‘s’ value needs to be 200 etc. The same principle follows if you are using different ‘n’ values. So to receive the second group of results where the ‘n’ value of the first group is 40, you need to set an ‘s’ value of 40. To receive the third group the ‘s’ value needs to be 80 etc.
So in our example where n=100 and we want to see the second group of results, we need to construct another API call:
To receive the third page you need to change this call to s=200 and run it again. To download all 62,744 results where n=100 you need to run 628 API calls, increasing s by the value of the previous ‘s’ plus the ‘n’ value each time. This is where a program would be useful.
Let’s look closer at the results. You will find that each result returned looks like this:
Newcastle Morning Herald & Miners' Advocate (NSW : 1876 - 1954)
... SECULAR EDUCATION. LONDON, Monday.-The Yorkshire Liberal Federation, by a large majority, has passed a resolution in favour of secular education. ...
There are some terms to explain:
- Each article has a unique ‘article id’ number which can be used to construct a citeable URL;
- ‘title id’ is the unique number given by Trove to each publication.
- ‘pageSequence’ is often the same as page number but it may have a letter after it indicating that the page is part of a special section or insert in the newspaper
- The ‘relevance score’ is the number Trove uses to sort the results. The results are sorted by Trove in order of relevance starting with the most relevant article. The calculation of this number is explained here.
- ‘snippet’ is the brief excerpt from the article that is returned below each search result.
We always want more. The most obvious thing that is missing is the complete text of the articles. Our API call now looks like this:
Adding Some More Precision to Searches
Some words contain another word. The word ‘schooling’ contains the word ‘school’. When you are searching for a particular word Trove returns results where an article includes a word which contains the word you are looking for. In this example a search in Trove using the word ‘school’ will also include articles with the word ‘schooling’ even if the article does not contain the word ‘school’. This is also the case for plural words such as schools.
Sometimes you want to refine your results so they only include the exact word that you are searching. In Advanced Search this problem can be addressed through the use of ‘fulltext’. So in the search box we need to write:
This reduces our results by nearly ten thousand articles, but these are more relevant articles for us.
We can also limit our articles in this way through an API call. However, first we need to look up the URL Encoding Reference List to find the code for the colon. It is ‘%3A’. So our API call now looks like this:
There are many refinements that you can introduce to your searches through the API, just as you do in your searches via the Advanced Search page. Full documentation of the API is provided on the Trove website.
Receiving your results in your browser window is nice, but you can do some more, once again without doing any programming. In the next post I will explain how you can use Excel to import your search results into a spreadsheet using the Trove API.
This is part one of a series about the basics of using the Trove API. The other posts are:
This explanation of the ‘s’ value in this post has been corrected and expanded upon in response to a comment from Travis M Sellars on the next post, Using the Trove API with Excel Spreadsheets.