This past Friday, March 7, at 9:30 am, the Vermont Digital Newspaper Project gave a workshop for librarians on how to use Chronicling America at the Midstate Regional Library in Berlin, VT.
We are pleased to announce that the VTDNP has applied for an additional 2 year round of funding from the National Endowment for the Humanities to digitize another 100,000 pages of historic Vermont newspapers from 1836-1922. With close to a million pages of newspapers from this era available on master negative microfilm in the state, we have more than enough great material to work with through this upcoming grant and beyond. This potential VTDNP Phase II grant will run from September 1, 2012 to August 31, 2014. The newspaper scans for Phase II, like Phase I, will be freely available to the public on the Library of Congress’ Chronicling America website. We should receive news about this grant in mid-2012.
I am often asked questions about the VTDNP–how it works, how we select titles, how long it takes to get new pages online, and others. I would like to take this opportunity to answer some of these questions:
1. I would like to read some more recent newspapers online–why do VTDNP titles stop at 1922?
Current copyright law classifies most published content before 1923 as “public domain.” Working with materials 1923 and later requires negotiating a copyright release from the owner of the material. The National Digital Newspaper Program (NDNP) wisely decided to avoid the sometimes byzantine realm of copyright law by limiting states’ selections to 1922 and earlier.
2. Where are the colonial and revolutionary era Vermont newspapers–why don’t you have any pre-1836 Vermont newspapers available?
To avoid duplication of commercially-digitized newspapers from pre-1836, the NDNP began selection with that year. One widely available database that specializes in these newspapers is America’s Historical Newspapers. If you are affiliated with a college or university, you may be able to access that database and others of this type through your library’s webpage. If you are not affiliated with a college or university, check with your local public librarian about how you can access these titles. One of the great advantages of Chronicling America is that it is freely available–because you have already paid for it with your tax dollars! For-profit sites limit access and therefore reduce usability. Fortunately, at 4 million pages and growing, Chronicling America dwarfs many of these commercial databases.
3. I would love to see the Bennington Banner from 1904. Why does it take so long for it to become available?
Part of the purpose of the NDNP is to establish good practices and standards for digitizing newspapers. These standards are designed to ensure continued access to high-quality scans of newspapers, but they do take careful work to implement. Quick and dirty scanning may be faster, but the results vary widely, and are sometimes illegible. Worse, such scans my become incompatible with new software, or may become corrupted to the point that they are destroyed. The life of a digital object is notoriously short–as little as 5-7 years by some estimates. NDNP standards are designed to avoid such catastrophic loss, and are designed to deliver the highest possible quality to users. NDNP standards are forward-looking in that the Library of Congress archives master scans and microfilm for each image in the program. This allows for future improvements in rendering and Optical Character Recognition (OCR).
4. What is OCR and why should I care?
OCR is Optical Character Recognition. OCR interprets characters from an image, allowing us to index terms from a newspaper automatically. This gives users the ability to search terms in the images. On Chronicling America, you can search an issue, a title, a state, or the entire collection of over 4 million pages for your terms. It is a powerful and convenient search tool, all made possible by OCR. As you can imagine, OCR is not 100%. So when you are searching, be aware that some pages may not show up that have your search terms on them. OCR is always improving, but image quality is what determines the OCR engine’s ability to “read” the pages. Sometimes the only copy available of a title is not so great, but if it is historically important, it is worth inclusion even if it has condition issues.
5. I would like to see my local paper on Chronicling America. How do you choose what titles to digitize?
The NDNP works with an advisory board made up of 11 historians, librarians, museum directors, and journalists who are all well versed in historical Vermont newspapers. We select titles for digitization based on the availability of a title and the recommendations of our advisers. Not all 19th and early-20th century newspapers have survived, and not all that survived are available on master negative microfilm. The NDNP requires that we digitize from exiting microfilm. This maintains high image quality at a relatively low cost for digitization.
I hope this has enhanced your understanding of the VTDNP and the work we do. Please contact me at firstname.lastname@example.org if you have more questions or would like clarification on any of these points.
– Tom McMurdo, VTDNP Project Librarian