Skip to content

Response to Review no. 1096

Thanks for the opportunity to comment on Dr Mussell’s comprehensive review of ProQuest Historical Newspapers (HNP). Here are a few clarifications:

  • This is a continually growing database that began with a focus on key US newspapers. However, over the years ProQuest has expanded internationally, most recently with the addition of the Times of India. We expect to continually add non-US dailies. Selection of titles is based upon market interest and editorially-evaluated importance of a paper over the course of its history. Nearly all of the papers include full-runs so that users can view historical events in context.
  • The source is, typically, the paper of record’s version of the microfilm copies. The digital reproductions are 300 dpi bitonal TIFF images wrapped in PDFs, which enables much faster access and printing than the download of a grayscale image.
  • Dr Mussell suggests that these massive databases are produced with reliance on OCR technology. While ProQuest does use OCR, it’s only one part of a very sophisticated manufacturing process that sets ProQuest apart from other newspaper databases. Our manufacturing process includes zoning to the article level and human editing of key OCR results, including article titles, subheading, author names, the first eight lines of an article and photo captions. As you can imagine, this extremely high-quality level, when applied to more than 25 million pages of content, carries a significant manufacturing and production cost.
  • In addition to allowing users to browse by decade, year, month and issue, users can search specific time periods as an entry-point into a browsing experience. Vice versa, a user can start down a path of browsing a publication and then execute a search. The discovery process is quite robust regardless of how one starts.
  • Dr Mussell notes HNP’s ‘fairly rich set of metadata’. This metadata has been abstracted and integrated into the search engine offering the ability to do faceted browsing on search results.