Suppose you had a big XML file in an odd, complicated structure (such as JMdict_e, a Japanese-English dictionary), and you wanted to load it into a database for searching and editing. You could faithfully replicate the XML schema in a relational database, with carefully-chosen foreign keys and precisely-specified joins, and you might end up with something like this.
Go ahead, look at it. I’ll wait. Seriously, it deserves a look. All praise to Stuart for making it actually work, but damn.
…
Done? Okay, now let’s slurp the whole thing into MongoDB:
#!/usr/bin/perl -CADS use strict; use XML::Twig; use MongoDB; my $mcoll = MongoDB::Connection->new() ->get_database('test')->get_collection('dict'); my @arrays = qw(k_ele ke_pri ke_inf r_ele re_pri re_inf re_restr sense pos field misc dial gloss ant s_inf xref stagk stagr lsource audit bibl etym links example pri trans name_type trans_det); new XML::Twig( keep_encoding => 1, twig_handlers => { entry => sub { my ($twig,$element) = @_; $mcoll->insert($element->simplify(forcearray=>\@arrays)); $element->delete; }, re_nokanji => sub { $_->set_text(1) }, }, )->parsefile("JMdict_e"); $mcoll->ensure_index({'k_ele.keb' => 1}); $mcoll->ensure_index({'r_ele.reb' => 1}); $mcoll->ensure_index({'sense.gloss' => 1}); $mcoll->ensure_index({'trans.trans_det' => 1});
This takes about four minutes to run on my Mac. Now let’s query it, using the mongo shell syntax:
% mongo > use test > db.dict.find({ "r_ele.reb" : "みつぼうえき" }) { "_id" : ObjectId("4c4b3b086b5cd7f958581201"), "k_ele" : [ { "keb" : "密貿易" } ], "r_ele" : [ { "reb" : "みつぼうえき" } ], "sense" : [ { "gloss" : [ "smuggling" ], "pos" : [ "&n;", "&vs;" ] } ], "ent_seq" : "1731560" } > db.dict.find({ "k_ele.keb" : /^密貿/ }) { "_id" : ObjectId("4c4b3b086b5cd7f958581201"), "k_ele" : [ { "keb" : "密貿易" } ], "r_ele" : [ { "reb" : "みつぼうえき" } ], "sense" : [ { "gloss" : [ "smuggling" ], "pos" : [ "&n;", "&vs;" ] } ], "ent_seq" : "1731560" } > db.dict.find({ "sense.gloss" : /smuggling/, "k_ele.keb" : /貿/ }) { "_id" : ObjectId("4c4b3b086b5cd7f958581201"), "k_ele" : [ { "keb" : "密貿易" } ], "r_ele" : [ { "reb" : "みつぼうえき" } ], "sense" : [ { "gloss" : [ "smuggling" ], "pos" : [ "&n;", "&vs;" ] } ], "ent_seq" : "1731560" }
It would be trivial to build a full-featured dictionary tool using any of the languages with a MongoDB library, and you could import the large name dictionary JMnedict and the full multi-language version of JMdict as well.
(Note: you can’t actually run that Perl script as-is on Snow Leopard, because Apple shipped the last version before the “-C” bug was fixed. You have to remove the “-CADS” from the first line of the script and set it in the shell instead, with export PERL_UNICODE=ADS. Naturally, I’ve glossed over some other small issues as well…)