Tuesday, September 23 2008

More toying with dictionaries

[Update: the editing form is now hooked up to the database, in read-only mode. I’ve linked some sample entries on it. …and now there’s a link from the dictionary page; it’s still read-only, but you can load the results of any search into the form]

I feel really sorry for anyone who edits XML by hand. I feel slightly less sorry for people who use editing tools that can parse DTDs and XSDs and validate your work, but still, it just strikes me as a bad idea. XML is an excellent way to get structured data out of one piece of software and into a completely different one, but it’s toxic to humans.

JMdict is well-formed XML, maintained with some manner of validating editor (update: turns out there’s a simple text format based on the DTD that’s used to generate valid XML), but editing it is still a pretty manual job, and getting new submissions into a usable format can’t be fun. The JMdictDB project aims to help out with this, storing everything in a database and maintaining it with a web front-end.

Unfortunately, the JMdict schema is a poor match for standard HTML forms, containing a whole bunch of nested optional repeatable fields, many of them entity-encoded. So they punted, and relied on manually formatting a few TEXTAREA fields. Unless you’re new here, you’ll know that I can’t pass up a scripting problem that’s just begging to be solved, even if no one else in the world will ever use my solution.

So I wrote a jQuery plugin that lets you dynamically add, delete, and reorder groups of fields, and built a form that accurately represents the entire JMdict schema. It’s not hooked up to my database yet, and submitting it just dumps out the list of fields and values. It’s also ugly, with crude formatting and cryptic field names (taken from the schema), but the basic idea is sound. I was pleased that it only took one page of JavaScript to add the necessary functionality.

[hours to debug that script, but what can you do?]