Exercise #1 Part 2: OCR (Optical Character Recognition)

For this activity, I wanted to see what it’s like when a computer tries to read text instead of a human. I used an OCR app on my phone and scanned page 36 of a book I have called Atomic Habits

Scanning Atomic Habits was easy, the text is clean, printed, and evenly spaced, so the OCR had no problem reading most of it. Once I saved the scan as a searchable PDF, I could highlight sentences, copy text, and search for words without much trouble. There were a few small mistakes, like weird line breaks or missing punctuation, but overall the text looked almost the same as the original page. It felt like the OCR “understood” the book pretty well.


Then, I wanted to see if scanning my journal was a completely different experience. 

The OCR really struggled with my handwriting. Some words were turned into random letters, others were skipped completely, and a few sentences didn’t make sense at all. Even though I could easily read what I had written, the computer could not. It was clear that OCR expects writing to be neat, consistent, and printed, which my journal definitely is not.

Seeing these two scans side by side made me realize how many assumptions OCR makes. It assumes clean fonts, straight lines, and clear spacing. When those assumptions are met, like in a proper book, OCR works well. When they aren’t, like in my journal, the technology breaks down. 

This really changed how I think about digitized texts.

This comparison also made me think about historical sources. Many important historical documents, such as letters, diaries, and personal notes, are handwritten. If OCR struggles with handwriting now, it likely struggles even more with older documents. That means some voices from the past might be harder to find or study simply because computers can’t read them properly.




Overall, this activity showed me that OCR is helpful, but it’s not neutral or perfect. It decides what is easy to find and what gets hidden. Just like with digitized newspapers and born-digital websites, technology plays a big role in shaping how we access and understand the past.

Comments

Popular Posts