Redacted and Encrypted PDF's with Hazel and PDFpenPro

December 26, 2013 by Gabe | [mmd] |

I've received some feedback on our scanning episode of Technical Difficulties. Katie Floyd posted a nice suggestion to automatically redact specific sensitive phrases from PDFs using Hazel.

For several versions now, PDFpen has had the ability to search a file for a string of text and redact that text. However, this functionality was not accessible via AppleScript. I begged with them to make it so. They came through in PDFpen version 6.

That's a nice way to programmatically redact pre-chosen phrases. But I own a ScanSnap and it has super powers (previous review).

In recent versions of the ScanSnap software there is an option for setting keywords on black and white documents using an ordinary highlighter. Wouldn't it be great to be able to just highlight the terms to redact on the original document and have a redacted PDF copy created? This way, I can easily keep sensitive information private but still have it legible on the original.

ScanSnap

To get started, I created a new ScanSnap scanning profile to save a PDF to a folder on my Mac. I also tweaked the resolution a bit to get higher quality text recognition.

Next, I setup the OCR options and enabled the Set Marked Text as Keyword option.

At this point, I can highlight terms1 on an original document and then run it through the new ScanSnap profile. Out the other end, I get a clean PDF containing some new keywords.

PDFpenPro Redaction

PDFpenPro is a powerful PDF editor and version 6 includes some terrific AppleScript support. As Katie points out, Smile gave me a great head-start on automating redaction.

It should be obvious where I'm going now. ScanSnap created keywords for me and PDFpenPro can automate redaction. Combining these two facts was just a few lines of AppleScript.

Automate it with Hazel

Here's the AppleScript that Hazel is running to automate the entire process:

set this_file to theFile
tell application "PDFpenPro 6"
    open this_file as alias
    set my_document to document 1
    tell my_document
        set key_words to keywords of info
        set {my_TID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {", "}}
        set my_list to text items of key_words
        set AppleScript's text item delimiters to my_TID
        repeat with my_item in my_list
            search string my_item
            repeat while performing search
                delay 0.1
            end repeat
            redact with block
            repeat while performing redaction
                delay 0.1
            end repeat
            delay 1
        end repeat
    end tell
    save document 1
    quit
end tell

The script grabs the incoming file from Hazel. It does the typical AppleScript hokey-pokey to generate a list from the comma separated list of keywords from PDFpenPro. The large repeat block takes each keyword, performs a search and redacts the term from the PDF.

Here's the basic workflow:

  1. Highlight terms on an original document
  2. Scan the document with the new ScanSnap profile
  3. Get a PDF with the highlighted terms redacted

Automated Encrypted PDF's

Many of my PDF's containing private information are stored in encrypted disk images. I'm considering abandoning the use of the disk images in favor of encrypted PDF's instead. To automate the encryption, this Hazel rule converts any PDF to an encrypted and password protected PDF.

set pw to "some password"
set this_file to theFile
tell application "PDFpenPro 6"
    open this_file
    save document 1 encrypt using AES256 password pw
    quit
end tell

This is a less secure adaptation of the excellent example on Scrubbs.me. That example uses a password stored in the keychain, rather than hard-coded into the AppleScript.

The meat of the script is simple. It uses PDFpenPro's AppleScript save method with the encrypt flag. It's a convenient way to generate password protected PDF's directly from ScanSnap output.

Conclusion

This is still a work in progress as I rethink how I handle scanning and archiving of sensitive documents. While I enjoy using Evernote for storing PDFs of instruction manuals, I'm not prepared to save tax or medical documents in an online service.

I suspect that I'll end up with a system that stores redacted documents in Evernote for quick searching and the original encrypted PDF's on a network drive or Dropbox. My experimentation is planned.


  1. I tested several colors of highlighters. Contrary to other reports, pink worked the best for me.