Now that Optical Character Recognition (OCR) technology is more widely available, you can use OCR to transcribe family documents, letters, and journals.
Updated 3/21/2024
IBM.com explains:
Optical character recognition (OCR) is sometimes referred to as text recognition. An OCR program extracts and repurposes data from scanned documents, camera images and image-only PDFs. OCR software singles out letters on the image, puts them into words and then puts the words into sentences, thus enabling access to and editing of the original content. It also eliminates the need for manual data entry.
A few years ago, the expense of professional OCR software would have made the use of OCR to transcribe family documents out of reach for the average family historian or memory collector.
Today, many of already have access to it. It’s built into programs many of already have on our computers.
I decided this would be a great time to test the Optical Character Recognition features built into Microsoft’s OneNote, Google Drive, and Adobe Acrobat Pro. In a future post, I’ll look at mobile-only apps like Genius Scan.
Use OCR to transcribe a short document
Cleaning out the basement, I came upon my Aunt Ann’s letter to my Aunt Nancy. She printed out a copy of her letter for my mom.
Side note: Aunt Ann would have been 68 in 1996. We learn a lot about her as she a) uses a word processing program, b) knows the specs of her 386 computer, and c) is still running out to cut the grass. Plus, how adorable is it that when she didn’t have time to write my mom a separate letter, she mailed her a copy of Aunt Nancy’s so mom would know what was new?
My test
I scanned the letter, saving it in both PDF and JPG formats. Then I opened the file in each of the following software.
OneNote
I use Microsoft 365. Initially, I had trouble finding the OCR feature. It doesn’t work in the web-based OneNote app. You need to use your desktop app.
OneNote uses OCR to transcribe text in images, so I inserted a JPG image into a page. To get the OCR output, you right click on the image, then select “Copy Text from Image.”
Then, I simply pasted the copied text into Word. The image below shows the result.
Notice the gobbledygook at the beginning. This comes from the handwritten part of the document. The OCR had trouble with the closing quotation mark of “The House of Crymes”. It chose a Courier type font, but the spacing was enlarged.
Not too shabby.
Adobe Acrobat Pro 2017 OCR
Adobe Acrobat is a relatively expensive software and my 2017 version does most of what I need it to (compiling and organizing long documents), so I haven’t updated it.
Comparing this five-year-old OCR with 2022 OCR might be a tad unfair. Still, it’s what I have in my house and may well be what you already own, so I’ll share the result.
For this test, I opened the PDF version of my scan and went to Tools > Edit. It’s while Adobe is attempting to edit a document that their OCR kicks in.
You can see that Adobe did a good job of recognizing the text. If you work in Adobe Acrobat, it is highly editable. The only problem comes in the middle of the page, where Adobe has generated quite a few text boxes, each with only a few words or fewer. If I wanted to copy this out of Adobe and edit with a different software, this would be inconvenient.
Google Drive/Google Docs
Google Drive, which you can obtain for free if you don’t already have it, offers OCR when you upload a PDF. You start the machine reading when you tell Google Drive to open a PDF document in Google Docs.
Note: For this to work, you have to have “Convert uploaded files to Google Docs editor format” checked in your settings. Go to the little gear icon at the top right.
After you upload a PDF to your drive, select that file and use your right mouse click for more options. Then click “Open with” > “Google Docs”.
When I first read Google Doc’s output of my document, I thought Google’s OCR was struggling much more with the text.
However, as I paged down, I realized Google had not only “read” the type-written text. It had worked on my Aunt Ann’s cursive handwriting as well.
On the type-written portion, Google Docs performed as well as the others. Google Docs also makes it very easy to copy and paste to another editing program. (I like to do that in Word because I have a ProWritingAid add-on what helps with the editing process.)
The winner for a one-page document? The program you already have and use.
A long document
Recently, I came upon my mom’s journal from her trip to New Zealand. She’d transcribed it herself, after a fashion. She’d typed it up and printed out about 50 pages. Seldom does procrastination pay off. However, after my parent died in 1998, transcribing my mother’s travel journals languished into the “one day when I have time” category.
Why would I want to transcribe and/or edit this?
For one, a digital transcription is much easier to share. Also, there may be small snippets of personality or commentary that I would like to use in other projects.
The Contenders: Adobe Acrobat Pro (2017) and Google Drive
OneNote only works off of individual images and my scanner has a feed where it can scan multiple pages and save them into a single PDF. For me, eliminated OneNote.
I’m not a OneNote power user. If I’m missing something on using its OCR, please let me know in the comments below.
Using Adobe Acrobat Pro 2017 OCR to transcribe a long, multi-page document
I really didn’t find too much to complain about in this scan. Adobe matched the font and the spacing. It did not process the document all at once. Rather I had to click on “Edit” with each new page. Adobe’s OCR rendered most pages in a single text box.
Using Google Drive OCR to transcribe a long, multi-page document
Google did not do as well with matching the font family, font size, and spacing. However, it scanned the entire document, which I found aided my editing flow.
I
Bottom Line
Either one of these programs’ OCR would work to transcribe. If you’re planning on editing in addition to digitizing them, Google Docs may have the edge. If you’d like a true-to-the-original font transcription, Adobe might be the best to use. If you don’t already own an Adobe Pro software, Google Docs is the winner.
Another Option:
If you’re exploring any of the AI software programs, try out their OCR for transcribing documents, even handwritten ones!
Last Step: Enjoy Reading
Part of the purpose of using OCR to transcribe family documents is to be able to enjoy the past. If you’re not getting eye-strain and carpal tunnel, you may find even more pleasure in romping down your family’s Memory Lane.
To that end, here’s a couple of cute excerpts from my mom’s 1997 New Zealand journal:
The state highway was pretty with the trees, hills and pastures and the wild flowers were gorgeous. Sheep dotted the hills in a most unusual arrangement and of course I had to take photographs–which would not have taken so long if they hadn’t all posed. Cows posed some too and they looked like the exact cows and lambs on the brochures.
The rental car has one feature that has been fun for me. There is a little “beep beep” when you exceed the set speed for your car. Saves a wife a lot of nagging.
In a hotel used to film Agatha Christie movies:
Although the hotel was basically empty except for a few employees, you could just about hear the piano playing, people laughing and talking and a brawl breaking out in the bar.
Truth be told, I don’t think I would have been emotionally capable of enjoying the process of transcribing the fun my parents had on their next-to-last trip ever in the years after their deaths. Now, however, I find it comforting.
Your Turn: Do you use OCR to transcribe family documents or even business documents?
What software works best for you? What short cuts have you found that might save other readers time?
I tend to transcribe directly into my word processor using a split screen (it’s Atlantis Word Processor) or into Ancestral Sources. However, for text from images, I can load a .jpg or .png into Snag-it and use the Grab Text feature as well 🙂
Snag-it is on my list of mobile apps to try. Thanks for pointing out its “grab text” feature.
Laura
Very useful! Thanks for the tip! Google has had it for a long time already, but I did not know that OneNote and Adobe Acrobat Pro do as well. 🙂
I use CamScanner:
1. To crop or clean up the image
2. To make a jpeg or PDF file
3. Extract Text
But I find that OCRs, don’t do a really good job of keeping the format intact like tables
Thanks Wanda. I haven’t tried that one. You’re right about preserving tables. I think for that, Adobe Acrobat Pro does best.
Thank you for this, so helpful! I am in the same boat with journals – boxes of my own and my great grandma’s to transcribe, in cursive! I will be giving google docs a try!
Drop back by and tell us how it’s going. I have tried small samples with ChatGPT, and it’s been pretty good too.