Tuesday, December 29, 2009

webloc shortcuts into DEVONthink Web Archives - python + automator + applescript

Recently I've been moving forward with my year-or-so long quest to organize my thoughts, readings and writings, and their associated notes, in a searchable format suitable for aiding me in deciding the order in which I should attack my ideas, and suitable for presenting me with relevant entries as I work through the execution of my ideas. To store and retrieve my documents I've decided to give DEVONthink a shot. A lot of searching, and a little bit of evaluating, went into this decision. I'm not 100% sold that it will meet my needs, but seems fit in with them pretty nicely. More on that another time, if I get around to it.

The Problem

In the past year I've accumulated 500 or so URL shortcuts on my Mac OS X desktop (more precisely, files with the extension .webloc that contain URLs in their resource forks). These were created by me dragging bookmark-like strings from Safari or Firefox to the desktop, ostensibly for future consideration and collation.


So, I have a collection of 500 files, each of which contains a URL, and I want each of those to become individual Web Archive entries in DEVONthink. This is versus its other record types that could be appropriate (PDF, URL, others).

A Solution

DEVONthink has no mass import feature that solves this problem exactly. Fortunately, and this is one of its outstanding features, DEVONthink is scriptable through AppleScript. In fact, there is a context-sensitive script "Add web document to DEVONthink" which appears in the menu bar's Scripts menu when Safari or Firefox are in the foreground. The contents of this script (~/Library/Scripts/Applications/Firefox/Add web document to DEVONthink.scpt) reveal how to send a URL, title and refering URL to DEVONthink for it to create a new record of the Web Archive type.

I then googled how to get the URL stored in a .webloc file as a string. I'm a Python programmer, and wanted to see how to do it in python and not AppleScript, so I ended up at http://toxicsoftware.com/webloc-to-pinboard/. The function infoForWebloc() does the trick.

I hadn't used Automator before so instead of tying together the AppleScript and Python directly I decided to make this my first foray into Automator. I created a Folder Action on a new folder (~/Desktop/url2devonthink/) with two actions: Run Shell Script and Run AppleScript.

The Shell Script portion executes a Python program to extract the URL and retrieve a suitable Title.

Shell: /bin/bash
Pass input: as arguments



Contents of ~/bin/webloc2url.py is as follows. I've removed some optimizations for clarity and brevity:


The AppleScript passes that on to DEVONthink for Web Archive creation. Note that I cut-and-pasted most of this from ~/Library/Scripts/Applications/Firefox/Add web document to DEVONthink.scpt.


Conclusion

Some caveats apply:
  • The above was almost certainly not the most optimal way of accomplishing the goal
  • The triple-pipe Cheap Hack (tm) was my lame workaround to not wanting to see how multiple arguments are passed between stages in Automator's pipes
  • The actual process was iterative and error-prone. Some URLs had become bad, some I didn't want to include, and so on.
  • Since this was a one-time task I have no need to create a more structured and reliable program than the above.
Now all I have to do is catalog all those new entries!

0 comments: