“…Kanna Arihara, and on behalf of the Hello!Project costume designers, I’d like to ask you all a few questions.”
No, this is not the cast of the live-action Negima series, although it might be amusing to map one onto the other. This is Tsunku’s Army, also known as Hello!Project, in their mid-2005 lineup. Yes, some of them really were as young as they look.
Most of them are still associated with the organization, at least on paper. Six are gone for good (three quit, two switched agencies, one was kicked out), but at least a dozen only perform at concerts maybe twice a year, and are otherwise not well-supported by H!P. The two teen groups get most of the promotion, with Morning Musume now third on the priority list.
Tsunku’s attention seems to be focused on bringing in new talent before they need training bras, which may be very Japanese, but doesn’t do anything for me. The only over-25 member who gets any significant promotion is Natsumi Abe. She’s also one of the few who occasionally manages to go on stage in costumes that aren’t hideous or unflattering, so I think she must know where the bodies are buried.
[Update: the editing form is now hooked up to the database, in read-only mode. I’ve linked some sample entries on it. …and now there’s a link from the dictionary page; it’s still read-only, but you can load the results of any search into the form]
I feel really sorry for anyone who edits XML by hand. I feel slightly less sorry for people who use editing tools that can parse DTDs and XSDs and validate your work, but still, it just strikes me as a bad idea. XML is an excellent way to get structured data out of one piece of software and into a completely different one, but it’s toxic to humans.
JMdict is well-formed XML, maintained with some manner of validating editor (update: turns out there’s a simple text format based on the DTD that’s used to generate valid XML), but editing it is still a pretty manual job, and getting new submissions into a usable format can’t be fun. The JMdictDB project aims to help out with this, storing everything in a database and maintaining it with a web front-end.
Unfortunately, the JMdict schema is a poor match for standard HTML forms, containing a whole bunch of nested optional repeatable fields, many of them entity-encoded. So they punted, and relied on manually formatting a few TEXTAREA fields. Unless you’re new here, you’ll know that I can’t pass up a scripting problem that’s just begging to be solved, even if no one else in the world will ever use my solution.
So I wrote a jQuery plugin that lets you dynamically add, delete, and reorder groups of fields, and built a form that accurately represents the entire JMdict schema. It’s not hooked up to my database yet, and submitting it just dumps out the list of fields and values. It’s also ugly, with crude formatting and cryptic field names (taken from the schema), but the basic idea is sound. I was pleased that it only took one page of JavaScript to add the necessary functionality.
[hours to debug that script, but what can you do?]
There are dozens of front-ends for Jim Breen‘s Japanese-English and Kanji dictionaries, on and offline. Basically, if it’s a software-based dictionary that wasn’t published in Japan, the chance that it’s using some version of his data is somewhere above 99%.
Many of the tools, especially the older or free ones, use the original Edict format, which is compact and fairly easy to parse, but omits a lot of useful information. It has a lot of words you won’t find in affordable J-E dictionaries, but the definitions and usage information can be misleading. One of my Japanese teachers recommends avoiding it for anything non-trivial, because the definitions are extremely terse, context-free, and often “off”.
When I upgraded to iTunes 8.0 and turned on the new Genius feature, I discovered that the US iTunes Store has acquired a rather large catalog of J-Pop, including a significant subset of the various Hello!Project groups’ albums and singles. The iTunes Genius analyzed my collection and gleefully pointed out all of the songs that would be perfect for me.
All of which I already owned. In many cases, it was pointing to the exact same song, from the exact same album. Why? Because the purchased albums have metadata that’s written with kanji and kana, and the iTunes versions are all romanized. Er, mostly romanized. Okay, inconsistently romanized. Album and song titles are usually romanized, artist names are all over the map: kana-ized, Hepburn-romanized, Kunrei-romanized, last-name-first, first-name-first, capitalization and white-space optional; fortunately they seem to stick with the same version for multiple albums.
This makes searching entertaining, but this is a big deal, because all of this stuff is at standard iTunes pricing, which is a helluva lot cheaper than import CDs, and just over half the price of the same tracks in the Japanese iTunes Store.
The Japanese store is the source of the peculiar partial romanization, by the way, and in fact when you view it from the US, all of the navigation is translated as well. I remember that when the store first launched, everything was in Japanese, including song titles, so I’m wondering if they’re geographically localizing not just the menus, but also the song metadata. The search system seems to handle pretty much anything you throw at it, so I wonder if Apple was seeing so many American purchases from the Japanese store through gift cards that they went out of their way to accommodate them, first through romanizing the interface, then through importing popular content.
There are some indexing oddities. If you search for “nakazawa yuuko” in the US store, you’ll get her most recent EP and a stub link that should lead to her audiobooks, but that only works if you’re on the Japanese store. I’m guessing that the stores all talk to each other internally, sharing indexes and content, with flags to indicate what content is importable. Given the price difference, new releases are unlikely to show up for a while.
I’m tinkering with a web front-end for my new dictionary lookup tool, and every once in a while I stumble across something entertaining. I’m using full-text indexing (Sphinx) to create a half-assed English-to-Japanese dictionary out of the JMdict data, and one of the words I typed in was “strip”. There were 36 matching records, and one of them caught my eye: 野球拳, “strip version of rock-paper-scissors forfeit game”.
Standard rock-paper-scissors is じゃん拳. 野球 means baseball in every other compound word, but in this one case, it ain’t. I have no idea how it got that way, but this isn’t a dictionary error, as can be seen from this promotional video for the PSP game YA-Q-KEN (warning: poorly-subbed dialogue, delivered by really cute AV actresses). It appears to offer a total of 9 girls willing to work with their hands.
Danny Choo found someone who really, really wants to let people know that he’s a Mari Yaguchi fan. I confess, I didn’t even need to read her name on the armband or the “sekushii beemu” on his chest to recognize her instantly, so I can’t really criticize too much…