Redacted and Encrypted PDF's with Hazel and PDFpenPro

December 26, 2013 by Gabe | [mmd] |

I've received some feedback on our scanning episode of Technical Difficulties. Katie Floyd posted a nice suggestion to automatically redact specific sensitive phrases from PDFs using Hazel.

For several versions now, PDFpen has had the ability to search a file for a string of text and redact that text. However, this functionality was not accessible via AppleScript. I begged with them to make it so. They came through in PDFpen version 6.

That's a nice way to programmatically redact pre-chosen phrases. But I own a ScanSnap and it has super powers (previous review).

In recent versions of the ScanSnap software there is an option for setting keywords on black and white documents using an ordinary highlighter. Wouldn't it be great to be able to just highlight the terms to redact on the original document and have a redacted PDF copy created? This way, I can easily keep sensitive information private but still have it legible on the original.

ScanSnap

To get started, I created a new ScanSnap scanning profile to save a PDF to a folder on my Mac. I also tweaked the resolution a bit to get higher quality text recognition.

Next, I setup the OCR options and enabled the Set Marked Text as Keyword option.

At this point, I can highlight terms1 on an original document and then run it through the new ScanSnap profile. Out the other end, I get a clean PDF containing some new keywords.

PDFpenPro Redaction

PDFpenPro is a powerful PDF editor and version 6 includes some terrific AppleScript support. As Katie points out, Smile gave me a great head-start on automating redaction.

It should be obvious where I'm going now. ScanSnap created keywords for me and PDFpenPro can automate redaction. Combining these two facts was just a few lines of AppleScript.

Automate it with Hazel

Here's the AppleScript that Hazel is running to automate the entire process:

set this_file to theFile
tell application "PDFpenPro 6"
    open this_file as alias
    set my_document to document 1
    tell my_document
        set key_words to keywords of info
        set {my_TID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {", "}}
        set my_list to text items of key_words
        set AppleScript's text item delimiters to my_TID
        repeat with my_item in my_list
            search string my_item
            repeat while performing search
                delay 0.1
            end repeat
            redact with block
            repeat while performing redaction
                delay 0.1
            end repeat
            delay 1
        end repeat
    end tell
    save document 1
    quit
end tell

The script grabs the incoming file from Hazel. It does the typical AppleScript hokey-pokey to generate a list from the comma separated list of keywords from PDFpenPro. The large repeat block takes each keyword, performs a search and redacts the term from the PDF.

Here's the basic workflow:

  1. Highlight terms on an original document
  2. Scan the document with the new ScanSnap profile
  3. Get a PDF with the highlighted terms redacted

Automated Encrypted PDF's

Many of my PDF's containing private information are stored in encrypted disk images. I'm considering abandoning the use of the disk images in favor of encrypted PDF's instead. To automate the encryption, this Hazel rule converts any PDF to an encrypted and password protected PDF.

set pw to "some password"
set this_file to theFile
tell application "PDFpenPro 6"
    open this_file
    save document 1 encrypt using AES256 password pw
    quit
end tell

This is a less secure adaptation of the excellent example on Scrubbs.me. That example uses a password stored in the keychain, rather than hard-coded into the AppleScript.

The meat of the script is simple. It uses PDFpenPro's AppleScript save method with the encrypt flag. It's a convenient way to generate password protected PDF's directly from ScanSnap output.

Conclusion

This is still a work in progress as I rethink how I handle scanning and archiving of sensitive documents. While I enjoy using Evernote for storing PDFs of instruction manuals, I'm not prepared to save tax or medical documents in an online service.

I suspect that I'll end up with a system that stores redacted documents in Evernote for quick searching and the original encrypted PDF's on a network drive or Dropbox. My experimentation is planned.


  1. I tested several colors of highlighters. Contrary to other reports, pink worked the best for me. 

Older articles

  1. Mavericks Tagging with Keyboard Maestro

    December 21, 2013 by Gabe | [mmd] |

    I'm not big into tagging but the support added to Mavericks has me interested enough to give it a try. I wanted a quick way to add or reset tags on files in the Finder so I decided to make a Keyboard Maestro macro that uses the Python script ...

  2. Writing Ideas [Link]

    November 08, 2011 by Gabe | [mmd] |

    Even if it is not inspirational to me, it is just a well told and funny story.

    Why I write ideas everyday

  3. More iPad Note Apps: We Might Have a Winner

    July 13, 2011 by Gabe | [mmd] |

    There's been a couple of new note apps 1 since my real-world review. Most of them still can not compete with Simplenote and Omnioutlner. However, I have found three very good alternatives. Notely, WriteRoom and Notability.

    All three of these apps started out simple enough but they have evolved ...

  4. Clean-Up Macro

    June 29, 2011 by Gabe | [mmd] |

    Most applications on the Mac allow you to hide all other applications. That's a nice way to pretend that your workspace is clean, but it's an illusion. I was thinking the other day that what I really want is something like a "Quit All Others" global hot key ...

  5. Can You Get It Out?

    May 22, 2011 by Gabe | [mmd] |

    I’ve been enjoying Dr. Drang’s tales of file format lock-in and his crusade against closed formats for his data. His stories always feel eerily familiar.

    I bounce back and forth between Macs and Windows machines in my daily life. My OS polytheism goes way back. I started with ...

  6. Sending Tasks to Omnifocus from Dropbox and Simplenote

    February 23, 2011 by Gabe | [mmd] |

    I never miss an opportunity to learn more about Applescript and extending my applications in ways the developers never considered. I also work in text files for all of my notes.

    Background

    At home I'm on my Mac with Notational Velocity and Omnifocus close at hand. However, at work ...

  7. Tagging in Pinboard

    February 21, 2011 by Gabe | [mmd] |

    I am generally skeptical of file tagging systems. Tags are volatile and can be inconsistent. I much prefer the file naming systems described by Benjamin Brooks. However, some systems do not lend themselves to file naming as a method of organization.

    I keep all of my bookmarks in Pinboard which ...

  8. Information Management in the Information Age - Part 1

    February 12, 2011 by Gabe | [mmd] |

    Context


    This post is the first in a series focusing on managing and finding content and information. It provides a context for the remaining posts and a justification for a few systems that I will describe.


    Bit Hoarding


    Hoarding is built into our genes. We have an innate compulsion to ...

  9. Effortless File Organization

    February 12, 2011 by Gabe | [mmd] |

    I've written about Hazel before. It is the more successful big brother of Apple's Folder Actions. Here's an example from Practically Efficient of integrating with Hazel for automatic file organization. What I particularly like about this example is the use of TextExpander to quickly rename files with ...