a glob of nerdishness

January 10, 2008

Playing with Mail and Leopard’s Latent Semantic Mapping

written by natevw @ 6:35 pm

While clearing Mail.app’s junk mail folder, I might have accidentally deleted a non-spam message. I’ll never know for sure, but as a result I learned a bit more about Leopard’s cool new Latent Semantic Analysis framework that I’d been wondering about since the mailing list leaked back in November.

Mail stores its spam information in ~/Library/Mail/LSMMap2. In addition to the Latent Semantic Mapping framework, Apple also provides lsm, a command line utility that provides the same functionality (with a little better documentation, I might add). As described in the man page, you can use lsm dump ~/Library/Mail/LSMMap2 to get a list of all the words that Mail’s spam filter knows about. (Some words probably NSFW, of course!) The first column is how many times the word has appeared in a “Not Junk” message, and the second is the count in spam messages. The last line gives a few overall statistics: how many zeroe values there were versus total values, and a “Max Run” value I don’t understand.

Between this and CFStringTokenizer and its language-guessing coolness, Leopard provides some fun tools for playing around with text analysis. Hopefully someday I’ll have a bit more time to dig into it.

Until then (or rather, for my future reference), here’s a bit more information on Latent Semantic Analysis: how it works and how it differs from Bayesian classification.

I’ve also uploaded a really quick and dirty “playground” for testing out the hypotheses the documentation left me with: lsmtest.m

Update: Came across a more “explainy” article about both Mail and Latent Semantic Analysis over on macdevcenter.com.

December 21, 2007

I quit.

written by natevw @ 11:11 am

Yesterday was my last day at my old job. It’s a decision I had been considering for a while now, and one that’s been settled in my head since the end of August. I still wonder, though, how I’ll remember this week looking back.

Over the past year, I came to realise that writing in-house software, regardless of the problem domain, was never going to stimulate my AD/HD-addled brain in a wholly satisfying way. That wasn’t reason enough to quit, though. I had a great boss, comfortable enough pay, camaraderie with some interesting co-workers, and the work still offered good challenges often enough to make it worth doing. Not all was rosy (my job description was in transition from “independent contractor working from home” to “employee commuting to an office building almost an hour away”) but life was good.

In short, the job had its upsides and downsides, like most of life. I didn’t really quit my job because the downsides outweighed the upsides, though there were plenty of days when that’s how I felt. I quit my job because of its opportunity cost. I’ve been wanting to start my own company ever since I was a twelve-year-old, craving augmented buying power but thwarted by those feckless child labor laws. I think my motivations have matured a little, but the dream never died.

As my interests changed — from computers, to music, to photography, to fleeing the torment they call “higher education”, back to computers — I collected a lot of neat software ideas. But ideas are like opinions: valuable, just not in a way that puts food on the table. (Unless you’re a patent troll…I digress.) Out of all my ideas, only a handful seemed to have much feasibility or market potential, and out of those, only one has consistently held my interest.

Since I first latched onto it in the Fall of 2004, I’ve watched the idea slowly move towards mainstream while I tried to do schoolwork and while I drove to my quasi-cubicle. I tried to fit it in as a hobby, but never got the momentum to bite off any sizeable side-project in my “spare time”. Trying to pursue two nearly-overlapping software lives was distractingly complicated. Now my mission is simple: Create the best software for organising photos geographically. Ever.

December 3, 2007

Hacking Stacks: A Failed Attempt

written by natevw @ 10:18 am

I’m pretty much enamored with optica-optima’s DRAWERS icons for Stacks. The concept, the icons, even the disk image they come in. Imagine my horror when my first subsequent download plopped itself right on top of my wonderful new “drawer”, once again shattering the illusion that I could like Stacks. “Date Added” is not a ‘touchable’ file property — the Dock somehow keeps track of this itself. Googling revealed a folder action based fix, but I wanted something that could be done automatically for all Stacks present and future. Poking and prodding revealed that I could sqlite3 ~/Library/Preferences/com.apple.dock.db and look at the fairly simple database the Dock uses for Stacks.

There’s a “directories” and a “files” table. The directories table has one row per stack, and just one main “path” column (it also has an sqlite3-implicit ROWID used as the directory_id elsewhere. The rest of each stack’s info is in the Dock’s plist). The files table had what I was looking for: an “ordering” column. So I added a drawers table, and inserted rows for each beautiful icon:

CREATE TABLE IF NOT EXISTS drawers (directory_id INTEGER, filesystemid INTEGER);

INSERT INTO drawers (directory_id, filesystemid) SELECT directory_id, filesystemid FROM files WHERE name LIKE ' %'; -- this should be a trigger for future additions, but see results....

Then I added a trigger so that whenever a new file is added to a stack with a drawer icon, the drawer’s icon would still have the highest “ordering” value:

CREATE TRIGGER drawer_defender AFTER INSERT ON files
BEGIN
 UPDATE files SET ordering=NEW.ordering+1 WHERE directory_id = NEW.directory_id AND filesystemid IN (SELECT filesystemid FROM drawers WHERE directory_id  = NEW.directory_id);
END;
-- if BEFORE INSERT, the new row doesn't show up at all for some reason

CREATE TRIGGER drawer_cleanup ON files AFTER DELETE ON files
BEGIN
 DELETE FROM drawers WHERE drawers.filesystemid = OLD.filesystemid AND drawers.directory_id = OLD.directory_id;
END;

The bad news is that while this works as far as the database is concerned, the Dock seems to keep track of the ordering itself until you “killall Dock”, which puts us right back in folder action territory with an even uglier transition. So unless somebody finds a way to get the Dock to read in the database without getting killed first, or Apple’s usability team regains a say in what prominent features get shipped, it looks looks like the sleight-of-hand folder action is still the best bet for helping Stacks out. That method has the added advantage of not requiring users to tinker with private Dock internals as well, which is probably a good thing.

November 20, 2007

Objective-C 2.0 Fast Enumeration internals

written by natevw @ 8:03 pm

Although the NSFastEnumeration Protocol reference is pretty obtuse, the linked article tells all you need to know about how fast enumeration is implemented. It’s not really all that complicated — or pretty, for that matter — but it nonetheless allows for (... in ...) looping over any conforming class. All that said conforming class has to do is give out a pointer to its own internal structures when possible, or, at worst, copy a handful of pointers at a time into some pre-allocated stack space.

November 16, 2007

Google IMAP in Mail.app (latecomer version)

written by natevw @ 4:31 pm

[Editors note: I'd been editing this article for a few days, and then John Gruber sent everybody to a similar article on 5thirtyone instead. I'm not jealous or anything — this glob is too typographically atrocious to merit a link from DF at present — but I still wanted to put this up for my own reference.]

 

Google recently rolled out free IMAP support to Gmail users. This is a neat gesture, but they twisted the IMAP protocol so that it works in The Way of the One True Algorithm. In their own words, “we’d like to make your IMAP experience match the Gmail web interface as much as possible”. Fortunately, Apple’s Mail provides the tools necessary to work around most of this Google IMAP “experience”.

The skinny

  1. Set Mail.app to work with Gmail
  2. Google has recommended settings. Ignore them. Well, DO uncheck “Store sent messages on the server”, unless you are using a non-Google SMTP server. But don’t uncheck “Store deleted messages on the server” or “Store junk messages on the server”.
  3. Map Gmail IMAP Folders to Mail.app Default Folders:
    Use mailbox For Mail’s…
    [GMAIL]/Trash Trash
    [GMAIL]/Spam Junk
    [GMAIL]/Drafts Drafts
    [GMAIL]/Sent Mail Sent, only if you are using a non-Gmail SMTP server
  4. Google has a table showing what actions in your email client do to your Gmail. Read them, but realize half of them are wrong or irrelevant with the way Mail.app is now set up. Here’s some corrections:
    If you want Gmail to… Do this in mail
    Apply a star to a message. Flag the message.
    Apply a label to a message. Copy the message to the corresponding folder.
    Remove a label from a message. Move the message to “[GMAIL]/All Mail”. Don’t delete the message.
    Undelete a message. Move it to “[GMAIL]/All Mail” or another label.
    Make “All Mail” not match what Mail.app shows Delete a message from “[GMAIL]/All Mail”. Don’t do this!
  5. As you use it, sometimes things get out-of-sync for a bit since Gmail is changing folder contents behind the scenes in ways that Mail.app doesn’t expect. If you want to make sure that what you see in Mail reflects the way things are in Gmail right away, use the “Synchronize All Accounts” item in the “Mailbox” menu.

The key to understanding how Google changed IMAP is to realize that the folders it presents are never locations in the sense that folders usually are, which is the sense that Mail.app treats them. Every Gmail IMAP folder is just a “Label”, a tag. Because of this, you may find (or place) copies of the same message in multiple folders. To Mail.app, these each look like individual messages with an identity of their own. To Gmail, they are just differing representations of one underlying object, which can only be deleted via the Trash.

The gory details

So what’s the problem with just doing it the way Google says to? Having folders represent labels means that Mail.app’s “Delete” button won’t work like it does for with normal IMAP accounts. When you “delete” a message in Mail, it removes it from whatever folder it was in and puts it in a deleted items folder. Since it doesn’t know about Gmail’s Trash folder, it creates a new one named “Deleted Messages” and moves your deleted items there. This is a problem, because to Gmail you’re just removing the “Inbox” tag and adding a tag called “Deleted Messages”. You haven’t really deleted the item, and it will still show up in “[GMAIL]/All Mail” and any other Label folders it was in. Then when you empty Mail’s trash, Gmail just sees you removing the “Deleted Messages” label and the message lives on, even if orphaned.

Deleting

To actually delete the underlying message, you must place a representation of it into the “[GMAIL]/Trash” folder-aka-label. (That’s what the mapping in step 3 is about.) While it hangs out there, the message will be hidden from all the other label folders it was in. If you move it back to another “folder”, it will reappear in all the Labels it previously had. There is one catch, though.

Not deleting, when we don’t mean to

What if we want to keep our message, but just remove a particular Label? If we hadn’t told Mail to use the “[GMAIL]/Trash” folder for storing deleted messages, we could just delete a message from the corresponding folder to clear that label. But if we do that now, Gmail will get not only a message saying “remove this message from Label” but also “add this message to [GMAIL]/Trash”. This will cause the message to be hidden from ALL labels, and when we empty the trash it will disappear for good. So we can’t do that, despite Google’s suggestion. Instead we move it, which sends two messages to the server: “remove this message from Label” (thus accomplishing our goal) and “add this message to the [GMAIL]/All Mail folder” (where it probably already is anyway). The same trick can be used to undelete a message as well.

When to move, when to copy

When copying message within Mail.app, Gmail is smart about maintaining only a single underlying identity. This is important, because to add a label you can’t really “Move” a message from one folder to another, because that would also remove the label you moved it from. So, generally, to add a label to a message copy it instead of moving by holding down the command key while you drag. If you do want to remove the Inbox label (for example), then by all means do move instead of copy.

Regarding “Inbox” and “All Mail”

Both “Inbox” and “All Mail” are just tags. If you remove an item from either, it stays on Google’s server (unless you move a “copy” from any folder into “[GMAIL]/Trash” or “[GMAIL]/Spam”, which we’ve set up Mail to do). There seems to be a discrepancy between “All Mail” IMAP folder and the “All Mail” view online: if you delete messages from the folder in Mail.app it still shows up in the web interface.

In thinking about wrapping up…

If you’ve got any other questions, tips or corrections feel free to leave them in the comments. Or in the comments on my new arch-nemesis’s article. But keep in mind, when I find out which friend in Omsk told friend in Tomsk the results of my research, there will be great suffering in Guilder. (Kidding, kidding!)

November 14, 2007

Restore IMAP data from Mail’s offline imapmbox backup

written by natevw @ 9:48 am

Let’s say you come into work and find a note on your desk from your boss: “The mail server went belly up.” All the messages on the IMAP server are gone. What to do?

The first step, and this is very important: Do not open Mail.app until you’ve made a copy of the offline mailbox cache. (You can hold Shift while logging in to keep it from automatically opening.) If you let Mail sync to your now-empty IMAP account, it will erase your offline copies lickety-split. As long as this doesn’t happen, it’s pretty easy to restore the server from your local backup.

  1. Find the corresponding IMAP-user@host folder inside of ~/Library/Mail/. Make a copy somewhere safe, like your Desktop.
  2. Rename all the .imapmbox folders inside of your new copy to have the .mbox extension instead.
  3. Now you can open Mail, and import the main backup folder. Select “File > Import Mailboxes…”, choose the “Mail for OS X” option and then select the modified IMAP-user@host folder.
  4. Move folders back onto the IMAP server. You might need to make one new folder (”Add Mailbox”) on the IMAP server so that it shows in the sidebar, and then you can drag the rest from the Import folder. Any sent messages or todos can be moved to those special mailboxes as well.
  5. If Mail complains that the folders you are trying to drag in already exist, one workaround I found is to delete the IMAP account and set it up again. A simple “synchronize” might have also done the trick.

Once Mail is done uploading the messages, you can delete your “Import” copies of each Mailbox. Then you can get back to seizing the day, whilst hoping you don’t have to do any of this again.

November 5, 2007

Make a link to a Mail.app email message

written by natevw @ 2:53 pm

As reported by Gus Mueller, remembered by Fraser Speirs and reverse engineered by dragging messages from Mail.app into a rich TextEdit document: Mail.app now supports permalink URLs to messages.

The URLs are just “message:” followed by the Message-Id, which should be URL encoded. So if your Message-Id is “<abc%20071105@sender.org>” — the angle brackets are considered part of the Message-Id — the URL becomes “message:%3Cabc%2520071105@sender.org%3E”, although “message:<abc%2520071105@sender.org>” will also work from Safari’s address bar.

This is basically just the mid: URL scheme with a different scheme name. Why they used “message” instead of “mid” is strange, especially since on the Flying Meat forum there’s a discussion of links provided by an add-on called MailTags that use a similar URL scheme. These have an extra “//” before the Message-Id, and make Mail grumble that “No associated application could be found”. Update: On second try the extra “//”s seem to work as well, so the real question is why MailTags, not Apple, didn’t use the “mid:” form.

To get the Message-Id, select “Long headers” or “Raw Source” in Mail’s “View > Message” menu. You can also drag a message from Mail into a rich text field to get the hyperlink. Enjoy!

Update: John Gruber has put together a more definitive article on this topic, though he doesn’t make mention of the (seemingly historical) “mid:” URL scheme.

Find the IP address for an SMB share

written by natevw @ 1:30 pm

If you’re able to connect to a folder on a Windows SMB share via smb://beigeboxname, you can find its IP address via nmblookup beigeboxname. Handy as Remote Desktop Connection doesn’t seem to be able to lookup the machine by that same name.

November 3, 2007

Leopard positives

written by natevw @ 5:12 pm

I’ve been using Leopard for just over a week now, sans Time Machine even, and it’s good stuff. For the record, a lot of the glitches seem to disappear after one reboot. I’m getting used to Spaces, though I still wish there were some little onscreen indication of which space I’m currently in. And I already might be attached to the glitz of the Dock/menubar.

  • Top feature so far: Xcode 3.0, hands down.
  • Top little fix: No intermediate .tar file from .tar.gz extraction.
  • Coolest spectacle: iChat switching between local and shared screens.

Over and out.

October 30, 2007

Hannah’s Leopard review

written by natevw @ 5:41 pm

Editor’s note: This is the only Mac or technology blog that my lovely wife reads, but within five minutes of looking at her new desktop, she was begging to make her disgust known publicly. Here are her notes, with only a few edits for clarity. If you thought developers (I’ve added cross-references) were just being curmudgeonly about some of Leopard’s new “features” , take a look at what a customer thinks.

 

Here’s what bugged me:

-The side dock is ugly. The dark box makes the screen look more cluttered. [I like the curved popping out folders], but they do not do that from the side. cf
-the semi-transparent top bar is unnecessary, sometimes annoying. cf
-there are obvious glitches (Camino didn’t hide when I told it to, etc.) Hopefully these things will be fixed soon.
-the new icons for “Pictures”, “Desktop”, “Documents” folders, etc. are indistinct. What were they thinking. cf

In summary, side dock users have been slighted and it seems that many changes (such as the top bar, and the rounded-edge pop-up menus) are just for the sake of having a change, and do not represent any true improvement.

Time Machine and other things may be great, but hopefully I will not be using time machine on a daily basis. I do have to look at the dock on a daily basis. Boo.

What was kind of cool:

I do like the calculator and dictionary in spotlight (hopefully I’ll remember to use them!) and the make-your-own widget from the web. And that I can use formatting such as bold while in Safari on blogger. Easily adding phone numbers and appointments to Address Book and iCal [from Mail] was impressive and might be useful.

« Previous PageNext Page »