Bill Holton
There is a new player in the optical character recognition (OCR) space, and it comes from an old friend: Winston Chen, the developer of Voice Dream Reader and Voice Dream Writer, both of which we’ve reviewed in past issues of AccessWorld. In this article we’ll start out with a brief conversation with Chen. Then we’ll take a look at the developer’s latest offering: Voice Dream Scanner. Spoiler alert—it will probably be the best $5.99 you’ll ever spend on a text recognition app!
AccessWorld readers who use their phones to audibly read e-Pub books, PDFs or Bookshare titles are likely already familiar with Voice Dream Reader. It works so well with VoiceOver and TalkBack, it’s hard to believe it wasn’t developed specifically for the access market. But according to Chen, “I just wanted to build a pocket reader I could use to store all my books and files so I could listen to them on the go. No one was more surprised than me when I began receiving feedback from dyslexic and blind users describing how helpful Voice Dream Reader was for their needs and making some simple suggestions to improve the app’s accessibility.”
Chen’s second offering, Voice Dream Writer, was also directed at the mainstream market. “Sometimes it’s easier to proofread your document by listening to it instead of simply rereading the text,” says Chen. At the time, Apple’s VoiceOver cut and paste features and other block text manipulation capabilities were,shall we say, not quite what they are today? The innovative way Chen handled these functions made Voice Dream Writer equally useful to users with visual impairments.
Reinventing the OCR Engine
“I’ve been wanting to add OCR to Voice Dream Reader for a few years now,” says Chen. “It would be useful for reading protected PDF’s and handouts and memos from school and work.”
The hurdle Chen kept encountering was finding a useable OCR engine. “There are some free, open source engines, but they don’t work well enough for my purposes,” he says. “The ones that do work well are quite expensive, either as a one-time license purchase with each app sold or with ongoing pay-by-the-use options. Either of these would have raised the price I have to charge too much for my value proposition.”
Last year, however, Chen began experimenting with Apple’s artificial intelligence (AI), called Vision Framework, that’s built into the latest iOS versions, along with Google’s Tesseract, TensorFlow Lite, and ML Kit.
“Instead of using a single standard OCR engine, I combined the best aspects of each of these freely available tools, and I was pleasantly surprised by the results.”
Instead of making OCR a Voice Dream Reader feature, Chen decided to incorporate his discovery into a separate app called Voice Dream Scanner. “I considered turning it into an in-app purchase, only there are a lot of schools that use Reader and they aren’t allowed to make in-app purchases,” he says. As to why he didn’t simply make it a new Reader feature, he smiles, “I do have a family to feed.”
Chen has been careful to integrate the new Voice Dream Scanner functionality into VD Reader, however. For example, if you load a protected PDF file into the app and open it, the Documents tab now offers a recognition feature. You can now also add to your Voice Dream Reader Library not only from Dropbox, Google Drive, and other sources, including Bookshare, but using your device’s camera as well.
To take advantage of this integration you’ll need both Voice Dream Reader and Voice Dream Scanner. Both can be purchased from the iOS App Store. VD Reader is also available for Android, but currently VD Scanner is iOS only.
Of course you don’t have to have VD Reader to enjoy the benefits of the new Voice Dream Scanner.
A Voice Dream Scanner Snapshot
The app installs quickly and easily, and displays with the icon name “Scanner” on your iOS device. Aim the camera toward a page of text. The app displays a real-time video image preview which is also the “Capture Image” button. Double tap this button, the camera clicks, and the image is converted to text almost immediately. You are placed on the “Play” button, give a quick double tap and the text is spoken using either a purchased VD Reader voice or your chosen iOS voice. Note: You can instruct Scanner to speak recognized text automatically in the Settings Menu.
From the very first beta version of this app I tested, I was amazed by the speed and accuracy of the recognition. The app is amazingly forgiving as far as camera position and lighting. Envelopes read the return addresses, postmarks and addresses. Entire pages of text voiced without a single mistake. Scanner even did an excellent job with a bag of potato chips, even after it was crumpled and uncrumpled several times. Despite the fact there is no OCR engine to download, and the recognition is done locally, a network connection is not required. I used the app with equal success even with Airplane mode turned on.
After each scan you are offered the choice to swipe left once to reach the Discard button, twice to reach the Save button. Note: the VoiceOver two-finger scrub gesture also deletes the current text.
Scanner does not save your work automatically. You have the choice to save it as a text file, a PDF, or to send it directly to Voice Dream Reader. You probably wouldn’t send a single page to Reader, but the app comes with a batch mode. Use this mode to scan several pages at once and then save them together: perfect for that 10-page print report your boss dropped on your desk, or maybe the short story a creative writing classmate passed out for review.
Other Scanner features of interest to those with visual impairments are edge detection and a beta version of auto capture.
Edge detection plays a tone that grows increasingly steady until all four edges are visible, at which time it becomes a solid tone. Auto-capture does just that, but since the AI currently detects any number of squares where there is no text this feature is only available in beta. However, if you're using a scanner stand it will move along quite nicely, nearly as fast as you can rearrange the pages.
You can also import an image to be recognized. Unfortunately, as of now, this feature is limited to pictures in your photo library. There is currently no way to send an e-mail or file image to Scanner. Look for this to change in an upcoming version.
The benefits of Voice Dream Scanner are by no means limited to the blindness community. Chen developed the app to be used as a pocket player for documents and other printed material he wishes to scan and keep. Low vision users can do the same, then use either iOS magnification or another text-magnification app to review documents. It doesn’t matter in which direction the material is scanned. Even upside-down documents are saved right-side up. Performance is improved by the “Image Enhancement” feature, which attempts to locate the edges of scanned documents and save them more or less as pages.
The Bottom Line
I never thought I’d see the day when I would move KNFB-Reader off my iPhone’s Home screen. Microsoft’s Seeing AI gave it a good run for its money and until now I kept them both on my Home screen. But I have now moved KNFB-Reader to a back screen and given that honored spot to Voice Dream Scanner.
Most of my phone scanning is done when I sort through the mail. Seeing AI’s “Short Text” feature does a decent job helping me sort out which envelopes to keep and which to toss into my hardware recycle bin. But Scanner is just as accurate as any OCR-engine based app, and so quick, the confirmation announcement of the Play button often voices after the scanned document has begun to read.
This is the initial release. Chen himself says there is still work to be done. “Column recognition is not yet what I hope it will be,” he says. “I’d also like to improve auto-capture and maybe offer users the choice to use the volume buttons to initiate a scan.
Stay tuned.
This article is made possible in part by generous funding from the James H. and Alice Teubert Charitable Trust, Huntington, West Virginia.
Related articles:
- Envision AI and Seeing AI: Two Multi-Purpose Recognition Apps by Janet Ingber
- An Evaluation of OrCam MyEye 2.0 by Jamie Pauls
More by this author: