[Update: the editing form is now hooked up to the database, in read-only mode. I’ve linked some sample entries on it. …and now there’s a link from the dictionary page; it’s still read-only, but you can load the results of any search into the form]
I feel really sorry for anyone who edits XML by hand. I feel slightly less sorry for people who use editing tools that can parse DTDs and XSDs and validate your work, but still, it just strikes me as a bad idea. XML is an excellent way to get structured data out of one piece of software and into a completely different one, but it’s toxic to humans.
JMdict is well-formed XML, maintained with some manner of validating editor (update: turns out there’s a simple text format based on the DTD that’s used to generate valid XML), but editing it is still a pretty manual job, and getting new submissions into a usable format can’t be fun. The JMdictDB project aims to help out with this, storing everything in a database and maintaining it with a web front-end.
Unfortunately, the JMdict schema is a poor match for standard HTML forms, containing a whole bunch of nested optional repeatable fields, many of them entity-encoded. So they punted, and relied on manually formatting a few TEXTAREA fields. Unless you’re new here, you’ll know that I can’t pass up a scripting problem that’s just begging to be solved, even if no one else in the world will ever use my solution.
[hours to debug that script, but what can you do?]