Monday, July 19 2004

No wonder BabelFish has problems with Japanese…

I mentioned in a recent comment thread that I had developed some sympathy for BabelFish’s entertaining but mostly useless translations from Japanese to English.

It started with Mahoromatic, a manga and anime series that I’m generally quite fond of. The official web site for the anime includes a lot of merchandise, and I was interested in finding out more about some of the stuff that hasn’t been officially imported to the US market. So I asked BabelFish to translate the pages, expecting to be able to make at least a little sense of the results.

It was worse than I expected, and it took me a while to figure out why. At the time, the translation engine left intact everything it had been unable to convert to English, which gave me some important clues. It was also possible to paste Unicode text into the translation window and get direct translations, which helped me narrow down the problems. [Sadly, both of these features disappeared a few months ago, making BabelFish a lot less useful.]

The clues started with the name of the show and its main character, both of which are written in hiragana on the site. BabelFish reliably converted まほろまてぃっく (Mahoromatic) to “ま top wait the ぃっく” and まほろ (Mahoro) to “ま top”. The one that really got me, though, was the live concert DVD, whose title went from 「まほろまてぃっく らいぶ!&Music Clips」 to “ま top wait the ぃっく leprosy ぶ! & in the midst of Music Clips”.

With apologies to Vernor Vinge, it was a case of “leprosy as the key insight”. It was so absurd, so out of place, that it had to be important. Fortunately, the little kana “turds” that BabelFish left behind told me exactly which hiragana characters it had translated as leprosy: らい (rai).

But rai doesn’t mean leprosy. Raibyou (らいびょう or 癩病) does, but on its own, rai is one of “since”, “defeat”, or the English loanword “lie” (which should properly be written in katakana, as ライ). So where did it come from? Rai is the pronunciation of the kanji 癩, which means leprosy. Except that it doesn’t, quite.

Here’s where it gets complicated. Every kanji character has one or more meanings and pronunciations. Some came along for the ride when the character was borrowed from China (the ON-reading), others are native inventions (the KUN-reading), but neither is necessarily a Japanese word. There are plenty of words that consist of a single kanji, such as 犬 (inu, “dog”), but not all single kanji are words.

Our friend rai is one of the latter. It has only a single ON-reading, which means leprosy, but the Japanese word for leprosy is formed by appending another kanji, 病 (byou, “sick”). So while rai really does mean leprosy, it’s not the word for leprosy. BabelFish, convinced that anything written in hiragana must be a native Japanese word, is simply trying too hard.

So what was it supposed to be? That little leftover kana at the end (ぶ) was “bu”, making the complete word “raibu”. Say it out loud, remembering that the Japanese have trouble pronouncing “l” and “v”, and it becomes “live”. The correct translation of the title should be “Mahoromatic Live! & Music Clips”; neither of the words in hiragana should be translated, because they’re not Japanese words.

In fairness to BabelFish, the folks responsible for Mahoromatic have played a dirty trick on it. It’s actually a pretty good rule of thumb that something written in hiragana is Japanese and something written in katakana is not, and, sure enough, if you feed in ライブ instead of らいぶ, it will correctly come back as “live”.

I fell for this, too, when I tried to figure out the full title of the Mahoromatic adventure game 「まほろまてぃっく☆あどべんちゃ」. The part after the star (adobencha) is written in hiragana, so I tried to interpret it as Japanese. I knew I’d gotten it wrong when I came up with “conveniently leftover tea”, but I didn’t realize I’d been BabelFished until I said it out loud.

Isn’t Japanese fun? My latest surprise came when my Rosetta Stone self-study course threw up the word ビーだま (biidama, “marble” (the toy kind)). I thought it was a typo at first, this word that was half-katakana, half-hiragana, but switching the software over to the full kanji mode converted it to ビー玉, and, sure enough, that’s what it looked like in my dictionary.

A little digging with JEdict provided the answer. Back in the days when the Portuguese started trading with Japan, their word vidro (“windowpane”) was adopted as the generic word for glass, becoming biidoro (ビードロ). The native word for sphere is dama (だま or 玉). Mash them together, and you’ve got a “glass sphere”, or a marble. Don’t go looking for other words based on biidoro, though, because it fell out of fashion a few centuries back; modern loanwords are based on garasu.