A simple search is quick and easy using the Trove API but these are often too broad to provide the information we need. Any insights to be made are lost in the sludge of irrelevant results. In this post I will explain how to narrow the search down so that you can close in on the data that is more helpful for your research. If you have not used the Trove API before and need to learn how to do a simple search using this tool you should first read ‘An Introduction to the Trove API’.
Trove is a huge database of information contributed by over two thousand libraries in Australia as well as other organisations. It is an ever increasing data mine. Today the Trove website says that it holds 389,961,760 items. The Trove API gives access to items in several zones: book, picture, article, music, map, collection, newspaper, list. This series of posts focuses on the article zone which allows access to digitised newspapers.
This post is designed to be used as a reference when you need to find answers for particular types of searches that you are most likely to be conducting. Keep in mind that there is a lot more that you can do in the newspaper zone via the Trove API, and that there is also a large amount of data to explore in the other zones. If the answer to your question is not here then you should consult the Trove API Technical Guide. The other essential document you will need to consult regularly is the URL Encoding Reference so you can translate non-ASCII characters into a code that will be recognised on the Web.
First Things First – Learning About Facets
Each item in Trove has a number of attributes which are called facets. In the case of a newspaper item it will have a date of publication, a title, a type (eg advertising, family notice etc).
A facet must be preceded by ‘l-‘. Now it can be very hard to tell with the naked eye what is the character at the beginning of the string. It could be l, I or 1. In the font that I was using in Word it was clearly not a number but the two letters looked exactly the same. Is this character the lower case letter that falls between k and m in the alphabet, or is it the uppercase letter that falls between h and k?
This is a recurrent issue in computing but one that can be easily resolved. Simply copy the unknown character and paste it in an Excel spreadsheet. In a neighbouring cell query this letter using the =code function and a number corresponding to the ASCII code for the letter appears in the cell. Using this method we find that the character is the lower case letter that falls between k and m in the alphabet.
The other issue is whether the dash is an ASCII character. By consulting the URL Encoding Reference you will find that it is not and that the ASCII code for it is %2D.
So now you have the essential element that is used at the beginning of all calls that specify facets:
Now you are ready to look up the facet that you want to include in your call. Remember to also place the character, ‘&’ before this section of your API call.
Category Facets – articles and more
The category facet covers many types of written items in the newspapers:
- Details lists, results, guides
- Family notices
I won’t explain how to use each facet, but I will give an explanation of how I constructed successful API calls for three of these facets. By showing you the methods I used to work out how to use them as well as the API calls themselves I hope you can work out how to use the other category facets.
Computing is an arena where the pedant excels. Every space, dot and other blip is significant. You should assume that you are working in a case sensitive world unless advised otherwise. In this case make sure you have an initial capital for the word ‘Article’ or else you will get no results:
Detailed lists, results, guides
The following example is more complicated so I will only include the compulsory search terms required for a simple search of the word ‘horse’ in this category :
Hint: If you are getting an error message from your API Call or it is generally failing to work, do the same search in the Trove Advanced Search facility. Then copy and paste the URL that is returned into a text file or Word or something and study how it is constructed. Using this method I found that I had mistakenly used %82 to replace the comma whereas the correct ASCII code was %2C. Have a look at these two characters in the URL Encoding Reference and you will see why I made that mistake!
There is way more to this than I expected:
When I was testing this I received only one result using the API but 7676 results using the Advanced Search function. Clearly I had made an error but I could not work out the problem using the Trove API documentation so I studied the URL generated from the Advanced Search:
Hint: You might have an initial pang of anxiety at the sight of this loooong URL. Take a big breath and recall what your maths teacher taught you, “break down the problem into little steps and lay it out neatly”. So I split this URL into what I regard as active elements, the bits that actually generate our results, and the inactive bits. By changing the colour of the ‘active’ elements, the long stream of gibberish starts to make sense:
To my surprise this shows that to generate a list of Family Notices, two elements are required, not one and that they are separated by a bar, |, character. And note that we don’t seem to have to change the bar character into a URL code. However, if you are having problems getting things to work there is always someone online to help. Tim Sherratt pointed out that the second element was not needed and showed it working in his Trove Console so I was able to update this section of the post.
You will notice that in the Advanced Search screen the check box for illustrated articles is in a different section to the check boxes for the other newspaper items such as those listed above. To restrict your search to illustrated articles you need to use an API call like this:
I worked out the coding for this API call by going straight to the URL provided by the equivalent search through the Advanced Search screen and once again found that the coding was quite different to that suggested by the Trove API Technical Guide. Tim Sherratt then lent a hand and proposed a more succinct way of expressing this API call.
You have to work for your rewards. This post demonstrates that the documentation provided by the National Library of Australia for the Trove API has been correctly named as a guide. It sends you in the right general direction but the best way of working out how to get the API to work is by deconstructing the Trove search URL for your query.
The manager of Trove, Tim Sherratt has offered some assistance to people working out how to use the API. Enter your API call in his Trove Console and share the link for the results with him via @wragge on Twitter if the API call fails. He will endeavour to help you with the issue. This is a great offer from Tim who is the expert in Trove.
This post was a lot more work and a lot longer than I expected. I had hoped to cover all the basic Advanced Search features that researchers would be most likely to want to incorporate into their API calls. Instead I will cover searching for articles within specific dates and sorting and relevance in my next post.
This is the third post in a series exploring the Trove API. The other posts are:
I will be updating these posts in response to comments pointing out corrections needed or further explanations required. This post has been updated thanks to comments made by Tim Sherratt on Twitter and Travis M Sellers in the comments below.