Web

More toying with dictionaries


[Update: the editing form is now hooked up to the database, in read-only mode. I’ve linked some sample entries on it. …and now there’s a link from the dictionary page; it’s still read-only, but you can load the results of any search into the form]

I feel really sorry for anyone who edits XML by hand. I feel slightly less sorry for people who use editing tools that can parse DTDs and XSDs and validate your work, but still, it just strikes me as a bad idea. XML is an excellent way to get structured data out of one piece of software and into a completely different one, but it’s toxic to humans.

JMdict is well-formed XML, maintained with some manner of validating editor (update: turns out there’s a simple text format based on the DTD that’s used to generate valid XML), but editing it is still a pretty manual job, and getting new submissions into a usable format can’t be fun. The JMdictDB project aims to help out with this, storing everything in a database and maintaining it with a web front-end.

Unfortunately, the JMdict schema is a poor match for standard HTML forms, containing a whole bunch of nested optional repeatable fields, many of them entity-encoded. So they punted, and relied on manually formatting a few TEXTAREA fields. Unless you’re new here, you’ll know that I can’t pass up a scripting problem that’s just begging to be solved, even if no one else in the world will ever use my solution.

So I wrote a jQuery plugin that lets you dynamically add, delete, and reorder groups of fields, and built a form that accurately represents the entire JMdict schema. It’s not hooked up to my database yet, and submitting it just dumps out the list of fields and values. It’s also ugly, with crude formatting and cryptic field names (taken from the schema), but the basic idea is sound. I was pleased that it only took one page of JavaScript to add the necessary functionality.

[hours to debug that script, but what can you do?]

Dictionaries as toys


There are dozens of front-ends for Jim Breen‘s Japanese-English and Kanji dictionaries, on and offline. Basically, if it’s a software-based dictionary that wasn’t published in Japan, the chance that it’s using some version of his data is somewhere above 99%.

Many of the tools, especially the older or free ones, use the original Edict format, which is compact and fairly easy to parse, but omits a lot of useful information. It has a lot of words you won’t find in affordable J-E dictionaries, but the definitions and usage information can be misleading. One of my Japanese teachers recommends avoiding it for anything non-trivial, because the definitions are extremely terse, context-free, and often “off”.

more...

Upgrading Movable Type


The machine this site runs on hasn’t been updated in a while. The OS is old, but it’s OpenBSD, so it’s still secure. Ditto for Movable Type; I’m running an old, stable version that has some quirks, but hasn’t needed much maintenance. I don’t even get any comment spam, thanks to a few simple tricks.

There are some warts, though. Rebuild times are getting a bit long, my templates are a bit quirky, and Unicode support is just plain flaky, both in the old version of Perl and in the MT scripts. This also bleeds over into the offline posting tool I use, Ecto, which occasionally gets confused by MT and converts kanji into garbage.

Fixing all of that on the old OS would be harder than just upgrading to the latest version of OpenBSD. That’s a project that requires a large chunk of uninterrupted time, and we’re building up to a big holiday season at work, so “not right now”.

I need an occasional diversion from work and Japanese practice, though, and redesigning this blog on a spare machine will do nicely. I can also move all of my Mason apps over, and take advantage of the improved Unicode support in modern Perl to do something interesting. (more on that later)

more...

Dear Amazon,


I am only interested in women’s dresses as gift wrap. I don’t want to buy any, thanks.

Amazon Dressup

"Then leave the rest to Omakase!"


I just set up an Amazon affiliate account, for those occasional links to cool products. After completing the application, one of the features was:

Omakase links will show an Associate's visitors what they're most likely to buy based on Amazon's unique understanding of the site, the user, and the page itself.

I don’t know that I’ll use it any time soon (first comes the way-overdue server upgrade and site redesign), but at least it’s thematically appropriate.

A club that will have me as a member...


I would guess that somewhere in the neighborhood of 100,000 people have been qualified to wear this t-shirt. Approximately 99,950 of them paid for the privilege of membership, and the other 50 would have gotten the joke. Of those, perhaps a dozen will still remember it.

Bryant7

[and no, it’s not actually funny unless you’re one of those dozen, so I’m not going to explain]

[Update: Rory suggested a slight modification to improve the reference, and further decrease the number of people who’ll get it]

Dear Amazon Japan,


Please don’t send me the “complete your collection!” email for something when you haven’t yet shipped out the first items in the set. In particular, when I’m waiting for you to ship volumes 1 and 2 of a series, don’t send email about volumes 10 through 15.

Dear Spammers,


Thank you for (still) not learning to make the carpet match the drapes.

I was idly scrolling through my junkmail folder this evening (looking for more entertaining Japanese spam…) and came across the following:

Subject: Cute dogs massacred in Texas

Alex Rodriguez hot steamy adulterous pics with Madonna
http://www.testforum.(removed).de/main.html

--
Using Opera's revolutionary e-mail client: http://(removed)

I understand trying to trick people into reading your message with a “newsworthy” subject line, but you really ought to try to make the body match, or you’ll lose that precious click-through (necessary to infect Windows boxes with your Russian botnet code).

By the way, thanks for sending out English-language spam encoded in KOI8-R; it’s a useful clue for anti-spam tools.

[Update: no, seriously, you’re killing me here. Subject: “Charred bodies found near White House”, Body: “Have a break, have a Kit Kat - free online chocolate bar giveaways”. Also, Subject: “Hilary Clinton vows revenge”, Body: “The best places to shag in the wild, all listed right here.” Do. Not. Want.

By the way, I see Charter Cable still isn’t blocking outgoing SMTP]

“Need a clue, take a clue,
 got a clue, leave a clue”