Sunday, December 10 2006

Kanji cross-referencing

Jim Breen’s KANJIDIC includes cross-references for various printed kanji dictionaries, and KANJIDIC2 adds more. I’ve imported KANJIDIC into a SQLite database for use by my Perl scripts, and it’s quite handy (and much faster than repeatedly slurping in the original file and parsing it…).

However, it’s missing two cross-reference indexes that would be quite useful for me: JLPT level and White Rabbit Kanji Flashcards card number.

Most of the online JLPT references predate the 2002 test specifications, so the only reliable source I’ve found is The JLPT Study Page. The creator of that site is working from the latest edition of the test content specs, so apart from the occasional typo in the vocabulary, it’s solid data. It just wasn’t in a form directly useful to me, so I screen-scraped it and generated a simple text file, UTF-8 encoded.

The White Rabbit folks have an online lookup tool so you can generate your own cross-reference lists, but by the time I’d found it, I’d already read the forum article that explains their numbering scheme: Unicode sort order within JLPT level. A few seconds at the shell, and I had another simple text file (extended to include the planned Level 1 card set).