This morning, someone in my Twitter feed commented about the poor quality of the index in the book they were reading. This struck me as a good topic for a blog post because just a few minutes earlier, I was trying to find some very specific information in a book that did not have an index at all. I’m fixing a section of my dissertation about Italian reservists in Canada and I wanted to add a little information about separation allowances provided to the families of Italian reservists by the Canadian Patriotic Fund (CPF). I happen to have a copy of Phillip H. Morris’ official history of the CPF sitting an arm’s length away on my bookshelf. I know the book does not have an index, however, so rather than reach for the hard copy, I looked for an electronic version. Because the book was published in the 1920s and is most likely in the public domain by now, my first stop was The Internet Archive. This is easily the best place to look for any printed primary materials partly because of the quantity of material contributed by individuals and libraries, partly because of their digitized texts are usually OCR’d, and partly because the API makes the site more ‘hackable.’ No need for hacking today, though, I just needed a simple citation.
Luckily, the Internet Archive does have a copy of Morris’ history of the CPF and it is available as a text file, which made it so easy to find references to Italian reservists. Simply opening up the text file in my internet browser and using the ‘search’ function brought up all the relevant passages.
The CPF was by no means a centralized organization, however, and Morris’ history has a chapter devoted to the operations of the fund in each province. The chapter on Manitoba even mentioned exactly how many families of Italian reservists received assistance from the CPF.
Lastly, the book had an appendix that listed specific rates and allowances offered by the CFP to Canadian families.
All these great findings were gathered in a matter of minutes. In total, there were eleven ‘hits’ for the keyword ‘Italian’ and these were spread out over the 348 pages of Morris’ history. I cannot imagine how long it would have taken me to skim through the book in its entirety to find those three short excerpts, but I guarantee that is time better spent doing something else (like writing this post).
Morris’ history, being an official account, is a pretty problematic source. Written immediately after the war and compiled by none other than the Executive Secretary of the CPF, this source isn’t going to divulge any unflattering details regarding the ill-treatment of Italian families. Some historical hindsight is necessary.
The obvious place to look for a more analytic account of the CPF is Desmond Morton’s Fight or Pay. Morton’s book, published by UBC Press in 2004 definitely has a full index but text searcheability is still a nice feature. The work is barely a decade old and definitely not in the public domain, so The Internet Archive will be of little help. Google Books offers a reasonable alternative for copyrighted works. Fight or Pay was available for searching and sure enough I was able to find some relevant passages with ease:
The result for page 107 reveals the limitation of Google Books. Because the book is still under copyright, Google Books is only able to reproduce a portion of the complete text online. Google Books is kind enough to tell me that one of my search results is on page 107, but it won’t let me read the full page. No matter, I have a copy of the book on my shelf and can easily turn to that page to read the passage in context. Sure enough, Morton briefly discusses the discrimination felt by Italian families. Because Morton’s book has an index though, the text search probably didn’t save me as much time as with Morris’ text, but Mike Del Vecchio and Josh MacFadyen wrote a great blog post a few years ago about creating a personal library in Google Books so that instead of searching one text, you can search your own personal research corpus for a particular passage or keyword.
But what if you are looking through for something in a text that does not have an index, is not on the internet archive, and Google Books only offers a few snippets of text? This was the case a few months ago when I was writing about Scottish diaspora in Australia and New Zealand. I knew that the 5th Battalion of the Australian Imperial Force had a strong Scottish component and I also knew that the AIF discouraged expressions of Scottish identity through official accoutrements such as adopting a kilted uniform. I was hoping that the 5th Battalion’s history would tell me more about how Scottish members of the 5th Battalion displayed their common ancestry but Google Books only offered a few snippets of the book.
The HathiTrust is the next best available tool. The HathiTrust actually works in partnership with Google Books. All the text digitized by Google Books is compiled into the digital library of the HathiTrust, which is entirely word-searchable but, because of copyright laws, offers only limited access to the public. Just like Google Books, the HathiTrust won’t let you peer into a book it will let you do a key-word search and give you the resulting page numbers. It’s less than ideal, but as long as you can get your hands on a hard copy of the text, you can follow the HathiTrust search results as you would an index.
While I enjoyed reading through the history of the 5th Battalion, I was in a bit of a hurry to find those excerpts about Scottishness, so I threw a search term into HathiTrust and the engine told me exactly where to look:
Sure enough, I immediately found some great nuggets about Scottish identity in the 5th Battalion on those pages.
Certainly, this method is of limited use. Any keyword search of a digitzed text is going to depend on the quality of the OCR. When looking through printed primary sources that have been scanned and then OCR’d, a quick examination of the text should confirm whether or not the OCR is reliable enough to make a keyword search possible.
I also admit that I am not going to make any great discoveries with a keyword search of a single text. I usually rely on these methods when I have a pretty good idea of what I’m looking for and where I can find it, but, like everyone else, I would prefer to save some time. And that’s all I’m selling: an easy time-saving device that will usually work.