« March 2008 | Main | June 2008 »

April 2008 Archives

April 8, 2008

Leopard - Finally!

So, I upgraded to MacOS 10.5 recently (from 10.4). Those of you who know me will doubtless be thinking “my god, man, what took so long?!?”, and that’s a longer story than I want to get into right now. Suffice to say: we’re rocking and rolling now!

My impressions of the new OS are pretty favorable. I’ve read all the complaints about the UI changes, and they have some merit. By the time I upgraded, Apple had already released 10.5.2, which addressed many of the more unfortunate problems for people like me who put /Applications into the Dock.

I really DO like the “Fan” icon display, though, particularly for the new “Downloads” folder. Creating a folder just for downloads is something I could have done years ago, of course, but I hadn’t - everything downloaded to the Desktop, which inevitably became incredibly cluttered. But I love the new approach, and part of what makes it especially useful is that things in the “Fan” display can be dragged to the trash. HA! I love it! It’s the little things that make me happy. :)

The new X11 is a bit of a pain in the butt. I’d become very used to using xterm - or more precisely, uxterm - for all my terminal needs (which is to say, for 90% of what I do with my computer). That’s not so tenable now, particularly since Apple has apparently decided that uxterm was just too useful a shell script to let stand. I am keeping a copy of that shell script (which just runs xterm with all the necessary utf-8 flags and sets the LANG appropriately) handy, just in case, but for the time being, I’ve decided to migrate to using Apple’s Terminal full time now. Undoubtedly, it’s still not as fast as uxterm, but since getting an Intel iMac, I don’t really notice anymore (on the old dual 500Mhz G4, it was definitely noticable).

For migrating, I’ve had to create my own nsterm-16color termcap file (which I keep in ~/.terminfo/n/nsterm-16color ) in order to ensure that all the features I want work properly. I stole the file from ncurses 5.6, and modified it to add correct dual-mode swapping ( smcup=\E7\E[?47h, rmcup=\E[2J\E[?47l\E8 ) and then to support the home and end keys ( khome=\E[H, kend=\E[F ). These are things that the native OSX dtterm/xterm/xterm-color/whatever terminfo settings don’t do correctly. ( WHY???) …And then, of course, I had to fix the key mapping of pageup/shift-pageup and pagedown/shift-pagedown and all the relative keys, but that was easy to do in the Terminal.app’s preferences. The defaults are sensible, just not for folks who are used to xterm’s behavior. I also re-discovered that I hate Terminal.app’s default blue (a dark, almost-midnight blue), and much prefer having a lighter one. Thankfully I’m not the only one - Ciarán Walsh’s update to the TerminalColors plugin is solid and works well.

Other than that, things have been pretty smooth. I haven’t experienced any really strange compatibility problems — in large part, I think, because I keep my system pretty up-to-date, so I already had the “Leopard-compatible” versions of all the software I use (and all the Unix applications seem to work flawlessly without even needing a recompile - huzzah for that!).

The one application that needed SERIOUS fiddling is VirtualBox. They have an OSX version, but only in beta form. I use it mostly so I can provide sensible Windows XP support to relatives who have computer questions (and for doing browser compatibility tests). I had been using Beta 2 (1.4.6), which had worked flawlessly for my needs. Unfortunately, Beta 2 isn’t compatible with Leopard, so an upgrade to the latest (Beta 3) was necessary. THIS beta seems to have a few problems. For one thing, it can’t understand all the old machine definitions (so when upgrading, make sure you don’t have any important system snapshots or saved machine state that you need). However, it does understand the old disk files, so it’s a simple matter to create a new machine definition using the old disk. The new machine still won’t BOOT, though, and it took me an hour or so of fiddling to figure out how to fix it.

There are two major problems that crop up. First: they changed the default IDE controller for Windows XP guests. The old default was PIIX3; the new default is PIIX4. Either one will work, and if you install XP from scratch on a newly created XP host, it will work with the PIIX4 controller just fine. But if you’re booting from an XP that was created with Beta 2 (i.e. a WindowsXP installation that thinks you have a PIIX3 controller), it will blue-screen and reboot immediately after displaying the Microsoft logo: not good. Fixing it is easy, though: just change the IDE controller for your XP machine in the machine settings dialog.

The second problem is that the network doesn’t work. Actually, that’s not true, the network works just fine, it’s DNS resolution that doesn’t work (but one looks a lot like the other when you’re not paying close attention to error messages). For whatever reason, when your XP system uses DHCP to get its network information, the information it receives from VirtualBox is wrong. Specifically, VirtualBox tells it to resolve DNS names by contacting 10.0.2.3; it should be contacting 10.0.2.2 (i.e. the same as the router). Fixing this was just a matter of changing Windows’ network configuration to use a custom DNS server (10.0.2.2) rather than the one supplied by DHCP. Annoying, but nothing terrible.

The only other stumbling block in Leopard that I’ve come across is the iChat-vs-Internet-Sharing problem that other people have discovered. Essentially, if you have enabled Internet Sharing, iChat can’t do video conferencing. Something to do with being able to remap ports… the explanations I’ve read are rather vague. It’s not especially important to me, but came up when I was trying to demonstrate the virtues of Leopard to Emily.

Which reminds me: the new iChat is MUCH better for talking to multiple people at the same time. The “tabbed” chatting interface is terrific. The vaunted “Spaces” (virtual desktops) are nice, and implemented well, but I gotta say that I’ve gotten used to having just one desktop these days (I use Exposé a lot). Getting used to having the extra desktops will probably take a while.

Two more features I noticed were the Quick View (in Finder, press the space bar to quickly view something) and Web Clips (in Safari, you can take a snippet of a webpage and turn it into a Dashboard widget). Quick View is pretty great, especially for folders full of PDFs, because you can leave it up and keep navigating around the Finder (the contents of the Quick View window will track whatever you select in the Finder), but since I don’t spend much time in the Finder, it’s of limited use. If I could integrate it with my ~/.mailcap file, now THAT would be awesome. Web Clips are not quite as great as they could be. For one thing, they don’t refresh quickly (but they DO refresh—at first I didn’t think they did—and in the worst case, you can click on them and press Ctrl-R to force the issue), and for another, they can’t scale — many of the things I want to clip are large graphics that I wish to monitor. If OSX could scale clips down for me, that would make them much more useful.

Which reminds me — one new feature of Leopard that I adore is their new built-in VNC viewer. It may not actually be VNC, but that’s fine by me — it’s blazing fast, and best of all, it scales the screen down so that you can easily control a screen that’s larger than the one you have. Chicken of the VNC used to be a must-have application for me, but Leopard’s built-in screen viewer is much better for what I usually want to do (which is control the iMac upstairs from the laptop down on the couch).

April 24, 2008

YAASI: Yet Another Anti-Spam Idea

Branden and I had an idea to help with the spam problem on our system, and it’s proven particularly effective. How effective? Here’s the graphs from the last year of email on my system. Can you tell when I started using the system?

If you want to see the live images, check here.

The idea is based on the following observations: certain addresses on my domain ONLY get spam. This is generally because they either don’t exist or because I stopped using them; for example, spammers often send email to buy@memoryhole.net. Branden and I also both use the user-tag@domain scheme, so we get a lot of disposable addresses that way. These addresses are such that we know for certain that anyone sending email to them is a spammer. Some of these addresses were already being rejected as invalid; some we hadn’t gotten around to invalidating yet.

By simply rejecting emails sent to those addresses, we were able to reduce the spam load of our domains by a fair bit, and the false-positive rate is nil. But we took things a step further: since spammers rarely send only one message, often they will send spam to both invalid AND valid addresses.

If I view those known-bad addresses as, essentially, honeypots, I can say: aha! Any IP sending to a known-bad address is a spammer, and I can refuse (with a permanent fail) any email from that IP for some short time. I started with 5 minutes, but have moved to an exponentially increasing timeout system. Each additional spam increased the length of the timeout (5 minutes for the first spam, 6 for the second, 8 for the third, and so on). Longer-term bans, as a result of the exponentially increasing timeout, are made more efficient via the equivalent of /etc/hosts.deny. I haven’t gotten into the maintaining-my-spammer-database much yet, but I think this may not be terribly important (I’ll explain in a moment).

One of the best parts of the system is that it is fast: new spammers that identify themselves by sending to honeypot addresses get blocked quickly and without my intervention. So far this has been particularly helpful in eliminating spam spikes. Another feature that I originally thought would be useful, but hasn’t really appeared to be (yet) is that it allows our multiple domains to share information about spam sources. Thus far, however, our domains seem to be plagued by different spammers.

Now, interestingly, about a week after we started using the system, our database of known spammers was wiped out (it’s kept in /tmp, and we rebooted the system). Result? No noticeable change in effectiveness. How’s that for a result? And, as you can see from the graph above, there’s no obvious change in spam blocking over the course of a month that would indicate that the long-term history is particularly useful. So, it may be sufficient to keep a much shorter history. Maybe only a week is necessary, maybe two weeks, I haven’t decided yet (and, as there hasn’t yet been much of a speed penalty for it, there’s no pressure to establish a cutoff). But, given that most spam is sent from botnets with dynamic IPs, this isn’t a particularly surprising behavior.

Forkit.org and memoryhole.net have been using this filter for a month so far. The week before we started using this filter, memoryhole.net averaged around 262 emails per hour. The week after instituting this filter, the average was around 96 per hour (a 60+% reduction!). Before using the filter, forkit.org averaged 70 emails per hour; since starting to use the filter, that number is down to 27.4 per hour (also a 60+% reduction). We have recorded spams from over 33,000 IPs, most of which only ever sent one or two spams. We typically have between 100 and 150 IPs that are “in jail” at any one time (at this moment: 143), and most of those (at this moment 134) are blocked for sending more than ten spams (114 of them have a timeout measured in days rather than minutes).

Now, granted, I know that by simply dropping 60% of all connections we’d get approximately the same results. But I think our particular technique is superior to that because it’s based on known-bad addresses. Anyone who doesn’t send to invalid addresses will never notice the filter.

The biggest potential problem that I can see with this system is that of spammers who have taken over a normally friendly host, such as Gmail spam. I’ve waffled on this potential problem: on the one hand, Gmail has so many outbound servers that it’s unlikely to get caught (a couple bad emails won’t have much of a penalty). Thus far, I’ve seen a few yahoo servers in Japan sending us spam, but no Gmail servers. On the other hand, as long as I simply use temporary failures (at least for good addresses), and as long as ND doesn’t retry in the same order every time, messages will get through.

I’ve also begun testing a “restricted sender” feature to work with this. For example, I have the address kyle-slashdot@memoryhole.net that I use exclusively for my slashdot.org account. The only people who are allowed to send to that email address is slashdot.org (i.e. if I forget my password). If anyone from any other domain attempts that address, well, then I know that sending IP is a spammer and I can treat it as if it was a known-bad address. Not applicable to every email address, obviously, but it’s a start.

It’s been pointed out that this system is, in some respects, a variant on greylisting. The major difference is that it’s a penalty-based system, rather than a “prove yourself worthy by following the RFC” system, and I like that a bit better. I’m somewhat tempted to define some bogus address (bogus@memoryhole.net) and sign it up for spam (via spamyourenemies.com or something similar), but given that part of the benefit here is due to spammers trying both valid and invalid addresses, I think it would probably just generate lots of extra traffic and not achieve anything particularly useful.

Now, this technique is simply one of many; it’s not sufficient to guarantee a spam-free inbox. I use it in combination with several other antispam techniques, including a greet-delay system and a frequently updated SpamAssassin setup. But check out the difference it’s made in our CPU utilization:

Okay, so, grand scheme of things: knocking the CPU use down three percentage points isn’t huge, but knocking it down by 50%? That sounds better, anyway. And as long as it doesn’t cause problems by making valid email disappear (possible, but rather unlikely), it seems to me to be a great way to cut my spam load relatively easily.

About April 2008

This page contains all entries posted to Kyle in April 2008. They are listed from oldest to newest.

March 2008 is the previous archive.

June 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34