“When police officers approach us and want to investigate something, it’s ‘yes’ or ‘no, sir’, or somebody can end up dead.”
— Definition of a police state, New Mexico District Court editionFor amusement, I decided that my next Dashboard gadget should be a tool for looking up characters in KANJIDIC using Jack Halpern’s SKIP system.
SKIP is basically a hash-coding system for ideographs that doesn’t rely on extensive knowledge of how they’re constructed. Once you’ve figured out how to count strokes reliably, you simply break the character into two parts according to one of several patterns, and count the number of strokes in each part. It’s not quite that simple, but almost, and it’s a lot more novice-friendly than traditional lookup methods.
Downside? The simplicity of the system results in a large number of hash collisions (only 584 distinct SKIP codes for the 6,355 characters in KANJIDIC). In the print dictionaries the system was designed for, this is handled by grouping together entries that share the same first part. Conveniently, unicode sorting seems to produce much the same effect, although a program can’t identify the groups without additional information. A simple supplementary index can easily be constructed for the relatively few SKIP codes with an absurd collision count (1-3-8 is the biggest, at 161), so it’s feasible to create a DHTML form that lets you locate any unknown kanji by just selecting from a few pulldown menus.
For various reasons, it just wasn’t a good idea to attempt to parse KANJIDIC directly from JavaScript (among other things, everything is encoded in EUC-JP instead of UTF-8), so I quickly knocked together a Perl script that read the dictionary into a SKIP-indexed data structure, and wrote it back out as a JavaScript array initialization.
Which didn’t work the first time, because, unlike Perl, you can’t have trailing commas in array or object literals. That is, this is illegal:
var skipcode = [ { s1:{ s1:['儿','八',], s2:['小','巛','川',], s3:['心','水',], s4:['必','旧',], s7:['承',], s8:['胤',], s11:['順',], }, }, ];
Do you know how annoying it is to have to insert extra code for “add a comma unless you’re the last item at this level” when you’re pretty-printing a complex data structure? Yes, I’m sure there are all sorts of good reasons why you shouldn’t allow those commas to exist, but gosh-darnit, they’re convenient!
…so you don’t have to. Great fun.
And, yes, I’m sure the Republican platform is at least as deserving, but this week it should be the Democrat’s turn, especially with the not-quite-front-page news about that suspected al-Qaeda operative sneaking into the US from Mexico.
Hey, didn’t Gray Davis, our man for giving driver’s licenses to illegal aliens, just speak at the Democratic convention?
Bought this stuff on a whim at Mitsuwa Marketplace, and it’s pretty good. 420 calories, for those who follow such things, and I’m sure it has enough sodium to choke a food-faddist, but it’s quite edible. Available online from Asian Munchies.
So I’m ripping the soundtrack album for Hand Maid May, and it’s got 62 tracks on it. Tracks 26 to 35 are short in-character messages by the lead voice actress, which isn’t unusual (I’ve been threatening for some time to put the answering-machine message from the Mahoromatic soundtrack on my voice mail), but tracks 36 to 60 consist of her speaking the complete set of Japanese phonemes, so you can create a “voice collage” that personalizes those messages. That’s new.
I was mostly just amused by it, and then I realized that I now had high-quality recordings of a native speaker pronouncing each phoneme, just the thing for language drills. Obviously I can’t distribute the results, and the truth is that I’m past the need for that particular drill, but I think I’ll build it anyway.
I will not, however, create a voice collage of May telling me goodnight…
Two police officers sent out to help stranded motorists deal with a flash flood were stuck by lightning. How did they respond? They got up off the ground and continued with the job, later driving themselves to a hospital to be checked out.
Tip for the day: never argue with a cop in eastern New Mexico.
"She said someone should shoot you for defending the officer and lying to the public [and] that I should be hanging from a tree"
What was this caller angry about? The fact that agents of the Florida Fish and Wildlife Conservation Commission shot and killed a runaway tiger. Apparently, she’s not the only Gore fan in the area, as the agency has received at least five death threats for their actions.
Because, as we all know, tigers are our friends, and want nothing more than peaceful coexistence with their human neighbors. Really, they’re just overgrown kittycats who would never dream of harming a human being. Just ask Roy.
I like my new cellphone. The reception is much better than my old one, the MP3 ringtones are cool, and the built-in camera is… okay, the camera is pretty lame.
My biggest annoyance? Several times now, when setting a ringtone or alarm sound, I’ve paused on an entry in the list, and the phone starts playing it, and won’t stop until it’s finished! When this happens with a four-minute-long MP3 file, it’s a real pain in the ass. Even turning the phone off won’t work, because the music player refuses to shut down when the rest of the phone does, continuing until its buffer is flushed, which takes a while.
Not so bad when it’s, say, Cantina Band or The Shoggoth Song, but quite disturbing when it’s the orgasm scene from When Harry Met Sally.
I mentioned in a recent comment thread that I had developed some sympathy for BabelFish’s entertaining but mostly useless translations from Japanese to English.
It started with Mahoromatic, a manga and anime series that I’m generally quite fond of. The official web site for the anime includes a lot of merchandise, and I was interested in finding out more about some of the stuff that hasn’t been officially imported to the US market. So I asked BabelFish to translate the pages, expecting to be able to make at least a little sense of the results.
It was worse than I expected, and it took me a while to figure out why. At the time, the translation engine left intact everything it had been unable to convert to English, which gave me some important clues. It was also possible to paste Unicode text into the translation window and get direct translations, which helped me narrow down the problems. [Sadly, both of these features disappeared a few months ago, making BabelFish a lot less useful.]
The clues started with the name of the show and its main character, both of which are written in hiragana on the site. BabelFish reliably converted まほろまてぃっく (Mahoromatic) to “ま top wait the ぃっく” and まほろ (Mahoro) to “ま top”. The one that really got me, though, was the live concert DVD, whose title went from 「まほろまてぃっく らいぶ!&Music Clips」 to “ま top wait the ぃっく leprosy ぶ! & in the midst of Music Clips”.
With apologies to Vernor Vinge, it was a case of “leprosy as the key insight”. It was so absurd, so out of place, that it had to be important. Fortunately, the little kana “turds” that BabelFish left behind told me exactly which hiragana characters it had translated as leprosy: らい (rai).
But rai doesn’t mean leprosy. Raibyou (らいびょう or 癩病) does, but on its own, rai is one of “since”, “defeat”, or the English loanword “lie” (which should properly be written in katakana, as ライ). So where did it come from? Rai is the pronunciation of the kanji 癩, which means leprosy. Except that it doesn’t, quite.
Here’s where it gets complicated. Every kanji character has one or more meanings and pronunciations. Some came along for the ride when the character was borrowed from China (the ON-reading), others are native inventions (the KUN-reading), but neither is necessarily a Japanese word. There are plenty of words that consist of a single kanji, such as 犬 (inu, “dog”), but not all single kanji are words.
Our friend rai is one of the latter. It has only a single ON-reading, which means leprosy, but the Japanese word for leprosy is formed by appending another kanji, 病 (byou, “sick”). So while rai really does mean leprosy, it’s not the word for leprosy. BabelFish, convinced that anything written in hiragana must be a native Japanese word, is simply trying too hard.
So what was it supposed to be? That little leftover kana at the end (ぶ) was “bu”, making the complete word “raibu”. Say it out loud, remembering that the Japanese have trouble pronouncing “l” and “v”, and it becomes “live”. The correct translation of the title should be “Mahoromatic Live! & Music Clips”; neither of the words in hiragana should be translated, because they’re not Japanese words.
In fairness to BabelFish, the folks responsible for Mahoromatic have played a dirty trick on it. It’s actually a pretty good rule of thumb that something written in hiragana is Japanese and something written in katakana is not, and, sure enough, if you feed in ライブ instead of らいぶ, it will correctly come back as “live”.
I fell for this, too, when I tried to figure out the full title of the Mahoromatic adventure game 「まほろまてぃっく☆あどべんちゃ」. The part after the star (adobencha) is written in hiragana, so I tried to interpret it as Japanese. I knew I’d gotten it wrong when I came up with “conveniently leftover tea”, but I didn’t realize I’d been BabelFished until I said it out loud.
Isn’t Japanese fun? My latest surprise came when my Rosetta Stone self-study course threw up the word ビーだま (biidama, “marble” (the toy kind)). I thought it was a typo at first, this word that was half-katakana, half-hiragana, but switching the software over to the full kanji mode converted it to ビー玉, and, sure enough, that’s what it looked like in my dictionary.
A little digging with JEdict provided the answer. Back in the days when the Portuguese started trading with Japan, their word vidro (“windowpane”) was adopted as the generic word for glass, becoming biidoro (ビードロ). The native word for sphere is dama (だま or 玉). Mash them together, and you’ve got a “glass sphere”, or a marble. Don’t go looking for other words based on biidoro, though, because it fell out of fashion a few centuries back; modern loanwords are based on garasu.