There are dozens of front-ends for Jim Breen’s Japanese-English and Kanji dictionaries, on and offline. Basically, if it’s a software-based dictionary that wasn’t published in Japan, the chance that it’s using some version of his data is somewhere above 99%.
Many of the tools, especially the older or free ones, use the original Edict format, which is compact and fairly easy to parse, but omits a lot of useful information. It has a lot of words you won’t find in affordable J-E dictionaries, but the definitions and usage information can be misleading. One of my Japanese teachers recommends avoiding it for anything non-trivial, because the definitions are extremely terse, context-free, and often “off”.
A good example is 一か八か, which is translated as simply “sink or swim”. This is one possible usage, and you’ll find it in printed dictionaries (along with “do or die”), but she insisted it felt wrong. I did some digging, and it turns out that it’s an old gambling expression, meaning “to risk your entire stake on one throw of the dice”. In usage, it appears to retain this gambling flavor, making the poker expression “going all-in” a better match (at least in this modern era where even James Bond is reduced to playing poker).
Edict has been through a lot of revisions, and now that it’s generated from the XML-based JMdict file, it’s improving almost daily. Unfortunately, a lot of the JMdict content simply can’t be represented in the Edict format, so tools based on it are inherently less accurate than ones that parse JMdict directly.
[side note: Kanjidic is also now based on an XML file, but that one’s not actively maintained; much of the XML schema is collecting dust, waiting for someone with a whole bunch of patience to clean up the data]
The wwwjdic site uses the full data (along with some nice supplemental sources), but presents it in an extremely compact form that just throws a lot of it away. It also doesn’t let you link to search results, which the Animelab dictionary (based on the same data) does.
This meant, of course, that I had to write my own.
Nearly two years ago, I knocked together a set of functional command-line tools that imported and searched Edict and Kanjidic, and made a stab at accurately importing the XML JMdict. I wasn’t satisfied with the JMdict-based database, though, because I had made the same mistake the current JMdictDB project is making: directly converting the XML DTD into a SQL schema. Mine was less insane than his, but still unworkably complex.
A few weeks ago, I threw all that code away and started from scratch, designing the database schema around searchability. You’re only ever going to print complete entries, so store them as JSON-encoded blobs for portability, and add just enough tables to support “things someone would actually search for”. For now, it’s hosted on an old Shuttle PC at my house that I’m using to prototype my in-progress Movable Type upgrade; eventually, it’ll be moved here.
I’m using recent versions of both JMdict and Kanjidic2, with support for wildcards and output filters, and everything has a permalink. The entire project comes to about 1300 lines of code, including the XML import scripts, the CGI lookup script, and a custom dictionary sort. Plus a whole bunch of supporting libraries, of course.
The formatting needs work. I haven’t done much CSS styling yet, focusing on presenting all of the available data (well, not all of the Kanjidic cross-references yet). Also, I’m currently only expanding the abbreviations as tool-tips, and the presentation of the re_restr, stagk, and stagr fields probably only makes sense to me. The frequency-of-use abbreviations don’t even have expanded versions (not entity-encoded in the original data), so I’ll have to make some.