Wednesday, November 15 2006

Spam-guarding email addresses

I’ve been playing with jQuery recently. The major project I’m just about ready to roll out is a significant improvement to my pop-up furigana-izer. If native tooltips actually worked reliably in browsers, it would be fine, but they don’t, so I spent a day sorting out all of the issues, and while I was at it, I also added optional kana-to-romaji conversion.

I’ll probably roll that out this weekend, updating a bunch of my old Japanese entries to use it, but while I was finishing it up I had another idea: effective spam-protection for email addresses on my comment pages. The idea is simple. Replace all links that look like this: <a href=”mailto:jgreely@example.com”>email</a> with something like this: <a mailto=”i4t p5au l@4 h6p 6la 7wbt” href=”“>email</a>, and use jQuery to reassemble the real address from a hash table when the page is loaded, inserting it into the HREF attribute.

A full working example looks like this:

<head>
<script type="text/javascript" src="jquery.js"></script>
</head><body>
 
<a mailto="i4t p5au l@4 h6p 6la 7wbt" href="">J Greely</a>
 
<script>
var mailto = {"wm46":"fn70", "i07":"ynwq", "ztr4":"@m@", "2zg":"z.d",
"2rg":"2axz", "zib":"3vt", "r33":"c60.", "diun":"ysk", "h6p":"lue",
"u.d0":"qck", "7xi":"ste", "08uj":"q5sz", "0t18":"et", "5bv":"kgd",
"8voa":"40q", "1b9":"egqm", "e49":"crt", "xs@":"jqb7", "3do":"71m",
"9u9v":"bku", "m86":"nx0", "e3v":"rcm", "7jnd":"gtbm", "7wbt":"g",
"6la":".or", "pgj0":"ak1f", "i4t":"jgre", "j0g":"nbe", "tes0":"x5jv",
"tou1":"clg0", "s42":"jjcq", "xpi":"sde", "l@4":"dotc", "y004":"d0@@",
"5yj8":"2o7f", "s84v":"1c66", "r61":"3i2", "bp9b":"i0f", "lf7":"9j3",
"7s.h":"b4@", "nua9":"6wtp", "fst":"zkx", "jir2":"gg8i", "kms":"lzs",
"zo1":"ok@j", "0qh0":"@com", "y0kf":"hjh", "kmgu":"tn9", "i8c":"glb",
"z6f8":"7jav", ".ft":"3btw", "dvs":"1hka", "sw6h":"cas", "tm9":"hwca",
"ks3":"25uo", "25g.":"xdf", "rbgq":"sd0a", "gwra":"4yi",
"mbz5":"89l3", "rjkl":"49h", "8mu":"5c6r", "v2l":"t.n",
"p5au":"ely@"};
 
$(document).ready(function(){
        $("//a[@mailto]").each(function(){
                var hash = $(this).attr("mailto").split(" ");
                var address = "";
                var i;
                for (i in hash) {
                        address += mailto[hash[i]];
                }
                $(this).href("mailto:" + address);
        });
});
</script>

Seeding the hash with random values makes it hard to pick out the components of real email addresses (there’s another one in this example…). Creating the hash was trivial in Perl, and I can swipe code from my old Movable Type plugins to get it working there. Here’s the Perl:

foreach (1..50) {
        $mailto{randomstring()} = randomstring();
}
foreach (@ARGV) {
        my $address = hideaddress($_);
}
sub hideaddress {
        my ($address) = @_;
        my $hidden = "";
        while ($address ne "") {
                my $key = randomstring();
                redo if defined $mailto{$key};
                my $n = rand(2)+3;
                my $val = substr($address,0,$n);
                $mailto{$key} = $val;
                substr($address,0,$n) = "";
                $hidden .= "$key ";
        }
        chop($hidden);
        return $hidden;
}
sub randomstring {
        my $chars = "abcdefghijklmnopqrstuvwxyz01234567890@.";
        my $n = rand(2)+2;
        my $s;
        while ($n-->0) {
                $s .= substr($chars,rand(length($chars)),1);
        }
        return $s;
}