ScanSnap OCR with Devonthink Pro Office

Back in 2006 I briefly discussed the use cases for Devonthink Pro. ($149) I'm still a fan of Devonthink Pro Office (the newly renamed top tier version) but I use it less and less for filing documents. I've been leaning more towards application agnostic file storing.

However, there is one workflow that Devonthink Pro Office accels at: PDF OCR. ABBYY Finereader ($99) is included with the application as a plugin. I've worked with many PDF OCR products, including the grotesquely expensive Adobe Acrobat Pro ($499). Nothing beats the ABBYY Finereader OCR quality. Now, Acrobat may be faster but if the application supports queuing, then speed is irrelevant for most workflows.

Here is my Devonthink Pro Office workflow for document scanning and OCR:

I use the amazing and simple Fujitsu Scansnap for document scanning to PDF. I created a Scansnap Manager profile that shunts the scanned document directly to Devonthink.

Scansnap Manager profile

On the Devonthink end, I have the OCR work automated. This workflow allows me to scan multiple documents without ever having to touch the mouse or keyboard. The files are named with timestamps and dropped into the Devonthink global inbox.

Devonthink OCR Settings

The scanning process could not be more simple. Each subsequent scan job is queued in Devonthink and processed in order.

Devonthink OCR Quueu

I could keep the documents in Devonthink (that was how I used to manage my scans). Devonthink has almost mystical powers to auto-categorize and file documents. You simply need to create folders and move a few related files. Devonthink "learns" what files go together and will begin to automatically file them for you. However, many of the documents I scan are sensitive so I prefer to drop them into an encrypted DMG file for long term storage.

Devonthink Search

One benefit of the OCR process is that all scans become searchable text. ABBYY Finereader even recognizes text that is somewhat stylized. It's similar good but not as good as Evernote. That said, the OCR text is embedded with the PDF, unlike the way Evernote does OCR just for searching purposes. If I scan a bill or receipt, I can easily find it again later.

Receipt Document


The manual step for me is the Export from Devonthink. Since I prefer to keep my documents in a manually managed directory system I need to use the Devonthink export tools. These work well and maintain all of the OCR text (unlike Evernote).

Devonthink Export


If you already own Devonthink Pro Office, then you own some great OCR and PDF management tools. If not, then buy it and thank me later.