Searching without Spotlight

January 03, 2015 by Gabe | [mmd] |

Stand up if you ever collected piles of floppy disks. Now raise your hand if you used some kind of disk inventory program to help you figure out which disk contained the Spaceward Ho! installer. Now do the hokie-pokie, because that's what it's all about.

Spotlight on the Mac has now improved to the point where I almost don't know I'm using it. Applications like Alfred rely on Spotlight to do their searching but the Spotlight application itself is pretty good for basic file finding. Spotlight is not sufficient for dealing with external volumes or sifting through large numbers of files if you don't know the exact filename or parent directory. It hangs the Finder with even the most basic search on a large NAS directory.

This is a rundown of things I've tried and what I've learned. I'll start with a brief list of use cases and then dig into one of several options.

Use Cases

These are my specific challenges. Everyone has their own list of requirements but I think mine are somewhat boring and generic. Yay for me.

Offline Disk Scouring

I keep a large amount of stuff on my NAS, various Dropbox and Google Drive accounts and a couple of one-off archive volumes. I don't really want to mount each one just to do a quick search.

Search Metadata

Any disk inventory tool I'd consider needs to search more than just the file name. It needs to index the document text, where possible. That includes PDFs and text files.

It should also index metadata like modification time stamps, OS X tags, creator name, etc.

It's a plus if it can index email archives.

Complex Query Support

I don't need RegEx support but it sure would be a bonus. At the least, I need support for wild-card searching. Am I looking for a file named BBEdit Installer.dmg or is it BBEdit.dmg? I also want to be able to combine file name search and metadata search.

Rich Result Display

I need to be able to see the details of a search hit. I want more than the file path. Support for QuickLook and display of meta attributes is a must.

Result Prioritization

With around a million files on my NAS, search results should be intelligently organized and prioritized. An alphabetical list of results is almost useless. The ideal tool should decide on good matches for me.

Result Categorization

One thing that Spotlight stinks at is collecting results. Tools like PathFinder have a concept of a result stack for collecting results across multiple search sessions. That's more how I think. Search and collect and then action the results.

Saving Search Queries and Results

This is a feature I could take or leave but I'd prefer to take it. It'd be convenient to build up a precise query and reuse it again later. Or setup a query format that includes things like a filename and modification date and then just change the search terms to something else later. It's not a deal breaker but it's certainly a selling point.

The Contenders

Here's what satisfy most of my requirements:

  • DEVONthink
  • Fox Trot
  • EasyFind
  • HoudahSpot

Before I get into the available options I'll try to cutoff as many comments as possible. Here's what doesn't work for me:

Alfred, Launchbar and Quicksilver are great for searching a local volume. While you can index remote volumes, search results and performance get worse as the number of indexed files increase. Indexing massive numbers of files make these applications almost useless for finding anything, primarily due to the simplified interface. They are also not appropriate for searching unmounted volumes since the search fails.

Spotlight is not useful for indexing offline volumes and doesn't ever automatically index a mounted NAS for me.

Pathological is amazing in it's complexity. It's an impressive piece of technology and I'm sure it works well for the right twisted mind. I mostly include it here just to highlight the audacity of an XPath query tool for the Mac file system. By the same mad genius behind Fake and Fluid.

Found is a cool little search tool with some neat tricks. It can search the OS X file system as well as cloud services like Google Drive. But it can not search network volumes.

DEVONthink

This is an all around great application for information management. I use DEVONthink Pro Office (DTPO)for gathering information around a specific project. I've written plenty about the incredible diversity and impressive AI of this application so I'll not revisit it again here. Instead, I'll highlight the strengths and weaknesses of using it for file indexing and search.

The thing to understand about DEVONthink is that it can index files without loading them into its database. It will crawl a specified directory structure and pull in all of the metadata for searching but the files can be stored elsewhere. This is great for regularly accessed files or at least commonly searched locations. I prefer to create a separate DB file for each location but it's not a requirement. Direct DTPO at a volume and then wait... wait... wait...

Lengthy indexing is to be expected though. DTPO crawls every file and scans for content and metadata. Indexing my "Deep Freeze" archive took too long to watch (at least an hour) but there's 614 GB of data stored in 236,000 files. The DTPO index size is almost half a gigabyte.

What's all this indexing worth? Quite a lot. First, search is almost instant in DTPO and includes metadata like modification timestamps and Yosemite tags. DEVONthink also has a couple advantages over the old standby Spotlight. First, as long as the files are accessible, hit terms are highlighted in the results.

If the volume is missing, search still works but the file preview is missing.

What I really love about working with indexed search in DEVONthink are all of the options for working with results. I can thumb through the results and see tags, paths, comments or really any metadata on the file. I can also use the contextual menu to organize specific results.

I recommend using the the DEVONthink "Replicate" command for pulling results matches into a folder. I create a new DEVONthink folder just to store results. I then replicate to the new folder. Replicants in DTPO are like aliases. They point to the original without duplicating the file. I can then use the "Show in Finder" or open the file directly from DTPO.

The other thing DTPO has going for it is that it's a pretty good file browser. The directory structure is recreated during indexing so I can actually browse while the volume is offline.

All of this unexpected functionality isn't worth much if I can't find the files I'm looking for. DTPO really delivers in the power-search features. With support for advanced query syntax like wildcard characters and boolean operators it's simple to create a good search. DEVONthink also supports structure metadata searching too.

The major downside to DEVONthink Pro is the index updating. It's slow and manual. If you need to update the index, plan on waiting. In my experience, the entire application was unresponsive during the index updates. There's no direct way to schedule index updates with DTPO. I'm sure it's something that could be solved with AppleScript, but that's something I haven't explored yet.

Price

$80 for Pro

$150 for Pro Office

Pros

  • Fast Search
  • Offline File Search
  • Result Preview
  • Sophisticated Query Syntax
  • Granular Metadata Filters
  • Powerful Result Collection and Filtering
  • Much More than a File Search Tool
  • Powerful AI for Prioritizing Results

Cons

  • Slow Indexing
  • No Automatic Re-Indexing
  • Expensive
  • Complex Product Line

Fox Trot

This application kind of comes out of nowhere for me. Unlike DEVONthink, Fox Trot is purely for searching for files. It exists just to find stuff and it does a pretty good job at it.

Fox Trot does not rely on the Spotlight index. At startup, it creates a few standard indexes like the Documents directory but it's far more flexible than that. Fox Trot supports indexing any volume your Mac can mount and the indexing is reasonably fast. You'll still be exercising some patience with 200,000 files but it's faster than DEVONthink.

The Fox Trot interface is nothing special. There's a primary search field at the top, a list view in the middle and supplemental filters down the left side. There's also an optional preview pane on the right side of the window. It's fairly generic, but then again, file searching is pretty boring.

With support for wildcard and boolean operators, Fox Trot is very capable. Exclude terms by preceding them with a hyphen (-bacon) or search with word proximity ({2} bacon kevin) to limit results with a great deal of control.

Results are returned instantly with Fox Trot, even with huge numbers of indexed files. It also works with offline volumes but, as expected, fails to provide previews.1

I like how Fox Trot handles indexing locations independently but combines them in the application. Whereas DEVONthink requires me to load the individual database to search, Fox Trot makes every index location available in one interface. It also makes index setup easy for common files.

Fox Trot also provides a bit of granularity of how each location is indexed.

Once indexed, searching is as easy or complex as I need. For simple text searching, it's fast. If I get too many results returned then I can start tweaking filters along the left pane that include date ranges, tags, file types and even specific sub-directory locations.

Accessing the resulting files is straight forward. Double click opens it or right click to reveal in the Finder.

For collecting results, Fox Trot is easily beat by DEVONthink but does provide the most basic functionality of "bookmarking" a specific result. Bookmarks aren't very functional for anything more than locating the file at some later date but it suits most of my needs. Fox Trot also provides a way to save the query as a template or reload it later to repeat on a different set of files.

While bookmarking is unsatisfactory in Fox Trot, the history view is great. It tracks all queries and they can be viewed as a list and results recalled instantly by double clicking.

Similar to DEVONthink, Fox Trot highlights hits. There are also a couple of convenient options for searching within a single file using the same advanced syntax.

Something that seemed one part strange and one part awesome was the searching on a URL. After entering a location URL, Fox Trot displays a browser view and a search box. Enter a search term with any of the supported Fox Trot boolean and wildcard options and then get hit highlighting along with a list of matches.

The feature is similar to what I get with DEVONthink or DEVONagent but it's pretty nice to have in a file indexing application.

Price

$33 for Personal

$113 for Pro

Pros

  • Fast Search
  • Fast Indexing
  • Granular Metadata Filters
  • File Preview
  • Works with Offline Files
  • Sophisticated Query Syntax
  • Good Search History

Cons

  • Poor Documentation
  • Expensive
  • Complicated Product Line
  • Dated Design

HoudahSpot

Now we move into a different category of search tool. These next two do not pre-index files. They are real-time search tools. You fire off a query and they scan the location one directory at a time. That works surprisingly well when I'm searching a location I rarely look at. Since neither of these pre-index locations, they do not work with offline volumes.

The HoudahSpot interface is the most intuitive of the bunch. The simplified interface hides a huge amount of power when it comes to finding single file among hundreds of thousands.2

While HoudahSpot doesn't support searching offline volumes, it does support adhoc search of volumes not typically supported by Spotlight. It's simple to add a new volume. I recommend saving the setup as a search template if you think you will be using it regularly.

HoudahSpot supports the typical Spotlight meta attributes but it also provides access to the deep properties for advanced search.

I also appreciate the options for setting view columns in HoudahSpot. I'd love to see a few more options but this is a pretty good start.

HoudahSpot does not support sophisticated collection and bookmarking like DEVONthink or Fox Trot but there are plenty of ways to action a file once it's been identified. In theory, the OS X tags could be used to collect a selection of files across multiple searches. Then create a HoudahSpot template search to find all files with that tag. It's not ideal but it works.

The HoudahSpot menubar helper is also a nice feature if you are a dedicated user. I tend to only need it ocassionaly so prefer not to clutter my menu and hotkey shortcuts.

Price

Regularly $30

Pros

  • Easy to Use
  • Flexible Search Options
  • Greater Variety of Searchable File Attributes
  • Inexpensive
  • Flexible Search Templates

Cons

  • Does Not Work With Offline Volumes
  • Slow Search
  • Awkward to Collect Results Across Multiple Searches

EasyFind

EasyFind is also made by DEVONtech and brings some of their powerful search technology to a simple free utility. Like HoudahSpot, EasyFind does not rely on a file index. Instead, it searches real-time against a specified directory or volume. While the support for advanced query syntax is nice, this application just doesn't suit my needs.

EasyFind has the most basic interface of all the apps I've tried. It's one window with a few options. But, that's not why I can't recommend it for searching large data collections. It's just too slow. A single search on my Deep Freeze directory takes over 10 minutes to complete and during that time, no results are displayed until the query is done executing.

I can recommend EasyFind if you're just looking for a Spotlight replacement for a small-ish local volume. For anything large, it's just too slow.

Price

Free

Pros

  • See Price
  • Wildcard, Boolean and Proximity Syntax
  • Search Inside Packages

Cons

  • Very Slow
  • Does Not Work with Offline Volumes
  • No Saved Queries or Search Results
  • No Options for Aggregating Search Results

Conclusion

I already own DEVONthink Pro Office and it has the most features that match what I want to do. Fox Trot is pretty compelling and a very nice application but for the price I can't justify it existing in my Applications folder. I also already own HoudahSpot (and have for a very long time). I think it's ideal for searching the odd disk or rarely used NAS location. For everything else, I'll stick with DEVONthink.


  1. There's an option to include a copy of the file in the index to enable previewing offline files, but that kind of defeats the purpose of a NAS for me. 

  2. I bet you thought I'd use the phrase "Needle in a haystack." How is that expression understandable in the modern era? 

tags
Mac