Monday 17 March 2014

Newspaper transcriptions are often better than digital images - why?

Happy Saint Patrick's Day! Good to be back after having been under the weather for a few days ...

Did you know ... that when a document is scanned and digitized the image data might not be converted correctly - or at all? Let me explain ...

First of all you need to know that much of the imagery appearing online has been produced from microfilm. So the digital image is usually two generations away from the original source.

So, the problems begin with the original microfilm. In many cases the documents were microfilmed 50 or 60 years ago when quality standards were not as high as they are now. As someone who has prepared microfilm for three summers at The Archives of Ontario I can confirm that:
  1. blank / missing pages were not filmed nor indicated
  2. "re-takes" could not be edited out of the film, so if the first shot was out of focus the next page might seem duplicated
  3. if the original documents were in bound volumes - they were not de-bound before being filmed so information is missing from the "gutter edges" towards the binding. In the case of newspapers on the left side of a page you will have lost the furthest right hand column, and on a right side of a page you will have lost much of the first column.
Now comes the scanning process ...
  1. In theory once scanned, the out of focus pages that were followed by the re-takes can be edited out - but most often are not as this is too labour intensive. So you now have a scan of an imagine which is not of the best quality.
  2. The good news about the scanning of newspapers and books that missed pages can be added in to the digitals files as found
Now comes the "OCR" process - OCR = Optical Character Recognition
  1. The good news is that for more modern documents the quality controls and OCR training is amazing
  2. The bad news is really bad ... in most cases the OCR training can not "read" the old typeface prints of newspapers
What does this mean for the genealogist? When you go to "x" newspaper file online and you find a "search" field that can be used on a number of queries, i.e. title only, subject, all text, dates of coverage the search responds with "No results found". So, you make down in your notes you did a search of "The Weekly Gossip" from Anytown for so and so and no results were found .... BUT ....

If you know the date of the event, you may want to do a manual search of the digital image and you could be very well surprised at what you find!

This is where the use of newspaper transcriptions comes to play a major role.

Before the Internet and search engines you had to go to a Library and or an Archives to "read the microfilm" - and I am still convinced that by reading or even scanning whole issues of the masthead you will gain great insight into the lives and times of your ancestors.

Many people have taken the time to prepare transcriptions and or indexes of genealogical items of interest. Notable among these are the works on the Methodist newspapers by Rev. Donald McKenzie as recently re-published by Global Genealogy . You are given a summary of the entry from the given newspaper and then provided the details where to locate the notice.

However I also like the standard set by Michael Harrison in his transcription of Births, Marriages and Deaths (1837 - 1861) from the "Toronto Mirror".  By looking at this transcription one can see at a glance what issues are "missing" or contained no genealogy news, i.e. the transcriptions go from Feb. 10, 1843 (Vol. 6, Issue 29) to March 3, 1843 (Vol., Issue 32).

You also see how quickly and or how slowly items of interest (and tragedy!) were printed:

Published Jan. 20, 1843
Died - On the 9th September last, on board ship, oh her way to New York, Eliza, aged 32 years; on the 9th of December at Stratford, Canada West, Henry, aged 18; on the 25th of the same month at Toronto, John aged 25 and on the 15th inst. and at the same place Thomas, aged 20 years, the daughter and sons of Mr. John Morris, late of Donamore, Queen's County, Ireland".

However it is a genealogical gold mine of information!

So good readers ... take heart and remember this is one of the reasons we are RE-searching our family trees ... I need another cuppa green tea ... Till next time!

No comments:

Post a Comment