When you send me a SQL statement that updates a 600,000-record table based on a join to a 900,000-record table, please make sure there are indexes involved. Also, please don’t test on a toy database.
Just got a complaint from a user about a Perl script that wasn’t handling regular expressions correctly. Specifically, when he typed:
ourspecial-cat | grep 'foo\|bar'
he got a match on “foo” or “bar”, but when he typed:
ourspecial-grep 'foo\|bar'
he got nothing at all.
My surprise came from the fact that the normal grep worked, when everyone knows that you need to use egrep for that kind of search, and in any case, since the entire regular expression was in single-quotes, you don’t need the backslash. Removing the backslash made our tool do what he wanted, but broke grep.
Sure enough, if you leave out the backslash, you need to use egrep or grep -E, but if you put it in, you can use grep. What makes it really fun is that they’re the same program, GNU Grep 2.5.1, and running egrep should be precisely the same as running grep -E.
Makes me wonder what other little surprises are hidden away in the tools I use every day…
Three basic rules to keep in mind when trying to index that massive crapload of data you just shoved into MongoDB:
The current generation of S12 with the ION graphics chipset has been discontinued, with all remaining inventory now in the Lenovo Outlet Store for $399. I’ve been quite happy with mine. The ION gives it decent performance for HD video and light gaming, and it has a full-sized keyboard and bright, crisp screen with decent resolution.
[Update: they also have hundreds of brand-new power supplies for $16. For that price, I can have one at home, one at the office, and one in the trunk of the car, and never carry one around. They also have a hundred or so of the 10-inch netbooks in a major scratch-and-dent sale ($220), and some refurbished 10-inch tablet netbooks]
Suppose you had a big XML file in an odd, complicated structure (such as JMdict_e, a Japanese-English dictionary), and you wanted to load it into a database for searching and editing. You could faithfully replicate the XML schema in a relational database, with carefully-chosen foreign keys and precisely-specified joins, and you might end up with something like this.
Go ahead, look at it. I’ll wait. Seriously, it deserves a look. All praise to Stuart for making it actually work, but damn.
…
Done? Okay, now let’s slurp the whole thing into MongoDB:
Before I ramble, let me sum up my feelings about MongoDB:
Good: easy to install, easy to start using, easy to dump a metric crapload of loosely-structured data into, easy to write basic queries and indexes, rapid high-quality (free!) support from the core development team, phenomenal cosmic power.
Bad: hard to install some drivers (Perl in particular pulls in an amazing number of poorly-specified dependencies), hard to get a metric crapload of data back out quickly, hard to write advanced queries and indexes, easy to reach “here there be dragons” territory outside the scope of documentation and guarantees and available tools, easy to lock up a server process for minutes or hours with simple operations, easy to lose everything unless you replicate to at least one other machine that has independent power, eats RAM like you wouldn’t believe.
It is, if you like, the Perl of databases, immensely powerful and versatile, but extremely dangerous in the hands of amateurs. And, like early releases of Perl, it is a rapidly-moving target that grows new functionality every time you turn your back on it.
The past few weeks have been a heady mix of excitement, frustration, satisfaction, and annoyance, as my first large-scale use of MongoDB has moved from simple prototype, to initial production version, through major bugs and poorly-documented limitations (often “already fixed in next release”), to what I hope is finally “stuff I can just leave running and go work on other projects again”.
“I will not laugh. I will not laugh. I will not laugh.”
At this.
I suppose there’s a chance that he’s being deliberately vague about his actual project and the scale of the data involved, but he sounds so earnest.
“I cried because I had no salt, until I met a man who had no entropy.”