Wednesday, March 22, 2006

Using Spotlight from the OS X Commandline

One significant productivity-enhancing feature that arrived with Tiger was Spotlight. On its own, it changed the way some (if not most) Mac users use their desktops. A simple command-space pops up the Spotlight window, where you can enter a query string, and in a matter of seconds, get a list of files matching your query.

Spotlight has many advantages over traditional file-searching tools. For one thing, it's not a tool. It is a complete indexing and search framework that is tightly integrated into the Operating System. In addition to filenames and paths, it also indexes by file metadata and content. So Spotlight returns query results based on what's inside the file.

Spotlight benefits can also be enjoyed on the commandline, and this article explains how you can take full advantage of it from inside the OS X Terminal window.

The Old Way

Most commandline users are familiar with the ubiquitous find command.
evil:~/Desktop mohit$ find / -name '*Rails*'
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Syntaxes/HTML (Rails).plist
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Syntaxes/Ruby on Rails.plist
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Syntaxes/SQL (Rails).plist

Find is old-school. You give it a search path, and it begins its search by tediously recursing directories and finding matches to the query string. On even an average-sized filesystem, find can take a frustratingly long time.

Then there's the locate tool. Locate is much faster because it maintains a periodically-updated index of filenames and their locations.
evil:~/Desktop mohit$ locate Rails
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Commands
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Commands/Open test-case.plist
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/info.plist
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Snippets
/Applications/TextMate.app/Contents/SharedSupport/Bundles/Rails.tmbundle/Snippets/170 eruby forin.plist

The problem with locate is that the index is not updated dynamically. On OS X systems, it is updated weekly by code residing in /etc/weekly. Also, locate does not index the contents of the files, nor does it know anything about file metadata.

Enter Spotlight

Spotlight consists of a metadata-store and a content index that is dynamically updated by various importer plugins within the system.

The metadata that Spotlight maintains can be very application-specific. For example, images can contain metadata such as, "Dimensions" and "Color Space". Or music files can contain metadata such as, "Genre", "Bit-Rate" or "Encoding".

Spotlight indexes data by way of various importer plugins. These plugins know how to handle various kinds of data, such as iChat Transcripts, iTunes Music, e-mail etc.

Below is a snippet of the top processes on my PowerBook.
 1352 top         10.4%  0:01.41   1    18    22   620K   416K  1.05M  27.0M 
1337 mdimport 0.0% 0:00.45 4 62 55 1.06M 3.98M 3.15M 39.9M
1294 mdimport 0.0% 0:00.21 3 61 46 776K 2.82M 2.28M 38.9M
1283 mdimport 0.0% 0:00.35 3 61 47 748K 3.19M 2.36M 39.4M
1281 lookupd 0.0% 0:00.17 2 34 39 440K 912K 1.20M 28.5M
1258 iTunes 2.4% 2:32.63 4 226 376 17.0M 26.8M 41.7M 227M

Notice the three mdimport processes. The mdimport daemon is responsible for working with the importer plugins and updating the Spotlight index.

Example 1. A Basic Spotlight Query

The commandline version of Spotlight is mdfind. Simply provide your search query as a parameter and let it run.
evil:~/Desktop mohit$ mdfind Rails
/Users/mohit/Documents/Rails4Days.pdf
/Users/mohit/Documents/Agile Development with Rails.pdf
/Users/mohit/Library/Mail/POP-foobar@mail.snip.com/INBOX.mbox/Messages/20455.emlx
/Users/mohit/Local/rails
/opt/local/lib/ruby/gems/1.8/cache/rails-1.0.0.gem
/opt/local/lib/ruby/gems/1.8/gems/rails-1.0.0
/opt/local/lib/ruby/gems/1.8/gems/rails-1.0.0/bin/rails
/opt/local/lib/ruby/gems/1.8/gems/rails-1.0.0/builtin/controllers/rails_info_controller.rb
/opt/local/lib/ruby/gems/1.8/gems/rails-1.0.0/html/index.html
/opt/local/lib/ruby/gems/1.8/gems/rails-1.0.0/html/images/rails.png

The files that are listed also include files with content and metadata that matches the query expression.

Example 2. Limiting Your Search to a Specific Directory

The -onlyin parameter limits the scope of the search to the directory specified.
evil:~/Desktop mohit$ mdfind -onlyin ~/Desktop Rails
/Users/mohit/Desktop/Downloads/Linux/Documents/Work/Verizon Data/Tekelec/Tekelec_Alarm_Docs.pdf
/Users/mohit/Desktop/Projects/Client/nABLE Event Manager - High Level Architecture.doc
/Users/mohit/Desktop/Projects/Client/nABLE EM.doc
/Users/mohit/Desktop/Projects/Client/to-timesheet-2006-01.pdf


Example 3. Displaying File Metadata

Earlier, I mentioned that Spotlight also indexes file metadata. The mdls tool lets you examine the metadata for a specified file.
evil:~/Desktop/Projects/Tierone mohit$ mdls SomeDocument.doc 
nABLE EM.doc -------------
kMDItemAttributeChangeDate = 2006-01-23 08:12:42 -0500
kMDItemAuthors = ("Homer Simpson")
kMDItemContentCreationDate = 2006-01-23 08:12:40 -0500
kMDItemContentModificationDate = 2006-01-23 08:12:40 -0500
kMDItemContentType = "com.microsoft.word.doc"
kMDItemContentTypeTree = ("com.microsoft.word.doc", "public.data", "public.item")
kMDItemDisplayName = "SomeDocument.doc"
kMDItemFSContentChangeDate = 2006-01-23 08:12:40 -0500
kMDItemFSCreationDate = 2006-01-23 08:12:40 -0500
kMDItemFSCreatorCode = 1297307460
kMDItemFSFinderFlags = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSLabel = 0
kMDItemFSName = "SomeDocument.doc"
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 92160
kMDItemFSTypeCode = 1463304782
kMDItemID = 2821259
kMDItemKind = "Microsoft Word document"
kMDItemLastUsedDate = 2006-01-23 08:12:40 -0500
kMDItemTitle = "Document:"
kMDItemUsedDates = (2006-01-23 08:12:40 -0500)

The metadata consists of various attributes specific to the file. These attributes can be used with mdfind to limit the scope of your search.

A good reference for these metadata attributes can be found at the Apple Developer Connection site.

Example 4. Finding Files by a Specific Author

This time, we limit our search to all files by a given author. The attribute we use is kMDItemAuthors.
evil:~ mohit$ mdfind "kMDItemAuthors == '*Homer*'"
/Users/mohit/Documents/SomeDocument.doc
/Users/mohit/Documents/Microsoft User Data/AutoRecovery save of SomeDocument.doc

Notice that the query was double quoted, while the text-pattern was single quoted.

Example 5. Finding Music by Artist

The Spotlight query expressions can be quite sophisticated. It allows for various kinds of conditional operators and patterns. Below, we search for all music by John Scofield.
evil:~ mohit$ mdfind "kMDItemAuthors == 'John Scofield' && kMDItemContentType == 'public.mp3'"
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/07 Green Tea.mp3
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/06 Kubrick.mp3

Great! But I seem to be missing some files. Where are my AACs?

Spotlight organizes ContentTypes within ContentTypeTrees, so in this case, public.mp3 falls under public.audio.

Knowing this, lets refine our search query to include all audio files.

evil:~ mohit$ mdfind "kMDItemAuthors == 'John Scofield' && kMDItemContentTypeTree == 'public.audio'"
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/07 Green Tea.mp3
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/06 Kubrick.mp3
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/That's What I Say_ John Scofield Plays The Music of Ray Charles/01 Busted.m4a
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/That's What I Say_ John Scofield Plays The Music of Ray Charles/02 What'd I Say.m4a

Much better. But how did I know what to search for?

This is where mdls comes in handy again.

evil:~ mohit$ mdls "/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/02 Chank.mp3"
/Users/mohit/Music/iTunes/iTunes Music/John Scofield/A Go Go/02 Chank.mp3 -------------
kMDItemAlbum = "A Go Go"
kMDItemAttributeChangeDate = 2005-11-26 22:00:00 -0500
kMDItemAudioBitRate = 128
kMDItemAudioChannelCount = 2
kMDItemAudioSampleRate = 44100
kMDItemAuthors = ("John Scofield")
kMDItemComment = "Created by Grip"
kMDItemContentCreationDate = 2003-10-28 20:34:30 -0500
kMDItemContentModificationDate = 2003-10-28 20:34:35 -0500
kMDItemContentType = "public.mp3"
kMDItemContentTypeTree = (
"public.mp3",
"public.audio",
"public.audiovisual-content",
"public.data",
"public.item",
"public.content"
)
kMDItemDisplayName = "02 Chank.mp3"
kMDItemDurationSeconds = 406
kMDItemFSContentChangeDate = 2003-10-28 20:34:35 -0500
kMDItemFSCreationDate = 2003-10-28 20:34:30 -0500
kMDItemFSCreatorCode = 0
kMDItemFSFinderFlags = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSLabel = 0
kMDItemFSName = "02 Chank.mp3"
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 6511095
kMDItemFSTypeCode = 0
kMDItemID = 208222
kMDItemKind = "MP3 Audio File"
kMDItemLastUsedDate = 2003-10-28 20:34:35 -0500
kMDItemMediaTypes = (Sound)
kMDItemMusicalGenre = "Jazz"
kMDItemRecordingYear = 1998
kMDItemTitle = "Chank"
kMDItemTotalBitRate = 128
kMDItemUsedDates = (2003-10-28 20:34:35 -0500)

Looking at kMDContentTypeTree, we can tell that public.mp3 falls under public.audio.

We could have also searched by kMDItemMediaTypes, or kMDItemKind, or even a '*mp3' pattern in kMDItemDisplayName.

Example 6. Finding Other Content

You can find images by querying for files with kMDContentTypeTree set to public.image.
$ mdfind "kMDItemContentTypeTree == 'public.image'"

How about we refine that to only images within our iPhoto library.
$ mdfind -onlyin ~/Pictures "kMDItemContentTypeTree == 'public.image'"

Much Better.

Looking for Word documents?
$ mdfind "kMDItemContentType == 'com.microsoft.word.doc'"

Or maybe just PDFs?
$ mdfind "kMDItemContentType == 'com.adobe.pdf'"

Or Both?
$ mdfind "kMDItemContentType == 'com.microsoft.word.doc' || kMDItemContentType == 'com.adobe.pdf'"

Lets stick to plain-text.
$ mdfind "kMDItemContentTypeTree == 'public.text"


Example 7. Looking for Source Code

Finding all Ruby scripts.
$ mdfind "kMDItemContentType == 'public.ruby-script'"

Finding all kinds of scripts (Python, Bash, Ruby etc.)
$ mdfind "kMDItemContentTypeTree == 'public.shell-script'"

Finding everything except Python scripts.
$ mdfind "kMDItemContentTypeTree == 'public.shell-script' && kMDItemContentType != 'public.python-script'"

Finding Source Code (not scripts).
$ mdfind "kMDItemContentTypeTree == 'public.source-code"


Example 8. Using "kind" Keywords (Added 24/Mar/06)

Commandline Spotlight also supports the "kind:" keyword. This is simpler than filtering with kMDItemContentType.
evil:/ mohit$ mdfind "kind:pdf Calculus"
/Users/mohit/Documents/Elementary Calculus.pdf
/Users/mohit/.Trash/marktoberdorf.pdf
/Users/mohit/.Trash/FoundInfsmlCalc.pdf

Spotlight "kind" Keyword list.
Applications  kind:application, kind:applications, kind:app
Contacts kind:contact, kind:contacts
Folders kind:folder, kind:folders
Email kind:email, kind:emails, kind:mail message, kind:mail messages
iCal Events kind:event, kind:events
iCal To Dos kind:todo, kind:todos, kind:to do, kind:to dos
Images kind:image, kind:images
Movies kind:movie, kind:movies
Music kind:music
Audio kind:audio
PDF kind:pdf, kind:pdfs
Preferences kind:system preferences, kind:preferences
Bookmarks kind:bookmark, kind:bookmarks
Fonts kind:font, kind:fonts
Presentations kind:presentations, kind:presentation


Example 9. Using "date" Keywords (Added 24/Mar/06)

Files can also be filtered based on date related information.
evil:/ mohit$ mdfind "kind:pdf date:this week"
/Users/mohit/Desktop/chapter_1a.pdf
/Users/mohit/Documents/Elementary Calculus.pdf
/Users/mohit/Desktop/13.pdf
/Users/mohit/Desktop/Internet_map_labels.pdf

The date ranges that can be specified are:
  • date:this month
  • date:this week
  • date:this year
  • date:today
  • date:yesterday
  • date:tomorrow
  • date:next month
  • date:next week
  • date:next year

Note that the future ranges (tomorrow, next week, etc.) are for Calendar appointments.

Finally

As you can see, Spotlight is great for commandline junkies too. It is a fast, flexible alternative to the UNIX find command, and in many respects, more powerful than find.

But it is by no means a replacement. There are some things that Spotlight's mdfind just cannot do. UNIX find has a much richer set of options, and when it comes to digging deep into the system, there is no alternative.

For most purposes though, Spotlight works very well. Rewriting your shell scripts to use mdfind instead of find, will make them far more responsive (and far less portable). So here's another case where OS X's UNIX underpinnings have made for a useful tool that is usable, both from the GUI and from the Commandline.

19 comments:

  1. Hey, can Spotlight use a grep-like syntax?

    ReplyDelete
  2. Ron,

    If by grep-like syntax, you mean support for regular expressions, then the answer is no. But it's quite straightforward to filter mdfind results through grep. E.g.,

    $ mdfind -onlyin ~ "kMDItemKind == 'Folder'" | grep '.*board'

    That said, mdfind does support a limited pattern-matching syntax similar to shell file-globbing (e.g., *.*, somefi?e, etc.)

    ReplyDelete
  3. see also: http://toxicsoftware.com/blog/index.php/weblog/mdfind2/

    ReplyDelete
  4. that is an interesting article. good work! i learnt something new.

    ReplyDelete
  5. very nice article, thanks for the tips!

    ReplyDelete
  6. Only when it indexes ALL files will it replace locate or 'find | grep'

    ReplyDelete
  7. Is there a way to make Spotlight ALWAYS override the path filtering rules Apple has set? For me, Spotlight is a step backwards from the old find in that it's only good for user directories.

    ReplyDelete
  8. OK, so I played around with this a bit, and discovered one thing: Almost everything I was looking for was mentioned in many emails, and I want to filter out those results. Is there a way to say:

    "Find all references to 'smith' in my home folder but not if it's found in an email"

    or

    "Find all references to 'thailand' in my home folder but not if it's an image"

    ReplyDelete
  9. Can Spotlight search the full headers of emails?

    I recently was a lttle disappointed by the apparent answer; no.

    Try pasting in the IP of a suspected spammer or whatnot into Spotlight.

    ReplyDelete
  10. What about mdimport? I would like to use it in order to eveluate the contents of distinct files in shell-scripts. In conjunction with the '-n'-parameter it would be a great parser for all spotlight supported file fomats.

    BUT: mdimport doesn#t support stdout for piping results.

    Any known replacement or workaround?

    So far the best article I've read about this topic.

    ReplyDelete
  11. great article, thanks!

    ReplyDelete
  12. Good article, thanks. In response to the kvetching about Spotlight not indexing the entire filesystem, I wrote a short shell script that's available at http://barella.org/Perette/OpenSource/OS-X/pbloc. This script works like 'locate', but combines the data from locate and mdfind so for user files, it's always updated. For OS files, though, things are updated only as frequently as the locate database. The downside to this script is that it's noticeably slower than plain-old locate.

    ReplyDelete
  13. Good summary. All official Apple docs I've seen indicate that mdfind only searches metadata (hence the 'md' in the name.) But you mentioned in your article that mdfind can search on *content* as well, the same way that Spotlight does.

    In fact, your examples show hits that are based on content, since the filenames don't include the search term.

    However, I'm not able to duplicate that on my system. I tested it by searching for a unique string I knew to be inside a PDF (a string which did not occur in the filename.) Spotlight found it, mdfind did not. For me, mdfind only returns files whose name or other metadata matches.

    Strange.

    ReplyDelete
  14. Some Spotlight snippets also here: http://textsnippets.com

    ReplyDelete
  15. Great post. Thanks!

    Do you know any way to set spotlight comments from the command line?

    ReplyDelete
  16. You can use applescript to set the comment of the spotlight field, no problem.

    ReplyDelete
  17. This is great, thanks for the post!

    ReplyDelete
  18. Probably obvious but you can do basic math queries direct into the bar.
    eg,
    ceil(88/3)
    floor(3/2)
    2^9
    sqrt(55)

    ReplyDelete