Tools

Things that suck


  1. 10% packet loss on my DSL line.
  2. Three hours diagnosing the problem so that I could convince tech support it wasn't my equipment. And, yes, I even had a spare DSL modem lying around.
  3. At least four hours spent on the phone with support at various levels, mostly spent listening to Muzak and repeating parts of item #2.
  4. Being told that it will be 11 days before someone can physically come out and check the lines, since resetting the DSLAM didn't fix it.
  5. Discovering that every other service provider in the area (cable, wireless, etc) has at least a 5-day lead time, and juicy up-front costs for the required gear.

Dictionary update


[Update update: I’ve made a small change to add the full JMnedict name dictionary; a lot of things that used to be in Edict/JMdict have been moved over to this much-larger secondary dictionary, and I finally got around to integrating it. The English translations aren’t searchable yet, mostly because I need to rework the form and add the kanji dictionary to Xapian as well, so that I have J↔E, N↔E, and K↔E.]

One downside of moving a lot of stuff onto my new shared-hosting account is that I have to give up a lot of control over what’s running. Not only do I have to work through an Apache .htaccess file instead of reconfiguring the server directly, but I can’t run my own servers on their machine.

So, goodbye Sphinx search engine, hello Xapian (thanks, Pixy). While it suffers from a lack of documentation between “baby’s first search” and “211-page C++ API document”, it has a lot to offer, and doesn’t require a server. One thing it has is a full-featured query parser, so you can create searches like “pos:noun usage:common lunch -keyword:vulgar” to get common lunch-related nouns that don’t include sexual slang (such as the poorly-attributed usage of ekiben as a sexual position). That allows me to use the same tagging for the E-J searches that I use in Sqlite for the J-E searches. [note: everything’s just filed under “keyword:” in this first pass, and the valid values are the same as the advanced-search checkboxes]

I need a full-text search to do English-Japanese, because the JMdict data isn’t really designed for it. There are hooks in the XML schema, but they’re not used yet. As a result, my search results are a bit half-assed, which makes the new query support useful for refining the results. I can also split out the French, German, and Russian glosses into their own correctly-stemmed searches; with Sphinx, there was one primary body field to search, so all the glosses were lumped together. With a small code change, I can tag each gloss with the correct ISO language code and index them correctly.

The new version is now live on jgreely.net/dict, which means I should be able to move that domain over to the shared-hosting account soon.

Once I figured out how to use Xapian (through the Search::Xapian Perl module, of course), replacing Sphinx and adding the keyword support took a few minutes and maybe half a page of code, total. In theory, I could use it for the J-E searches as well, but I’d lose the ability to put wildcards anywhere in the search string, which comes in handy when I’m trying to track down obscure or obsolete words.

One thing I haven’t figured out is why I can’t use add_term with kanji arguments; both Xapian and Perl are working entirely in Unicode, but passing non-ASCII arguments to add_term throws an error. The workaround is to set the stemmer to “none” and use index_text, and that’s fast enough that I don’t need to worry about it right now.

The most annoying thing about the Xapian documentation is how well-hidden the prefix support is. The details aren’t in the API at all; you can learn how to add them to a term generator or query parser, but the really useful explanation is over in the Omega docs.

Yeah, we're geeks; deal with it.


An Ooma Christmas

Dear Open Source community,


This is the sort of attitude that makes me want to bitch-slap some sense into the lot of you.

more...

Three points define a plane...


…four points define a wobble. Some months back, I left myself a note to buy the Manfrotto Modo Pocket camera stand when it finally reached the US. I had taken their tabletop tripod with me to Japan, but hadn’t used it much because of the overhead: pull it out of the bag, find a dinner-plate-sized surface to set it up on, take the shot.

I didn’t bother buying any of the other “quickie” mini-tripods that are out there, because most of them struck me as gimmicks first, stabilizers second. The Modo Pocket, though, looked eminently practical:

Small enough to be left on the camera while it’s in your pocket, with a passthrough socket to mount on a larger tripod or monopod. Usable open or closed. Solidly constructed, like most Manfrotto products. A design that derives its cool looks directly from its functionality. It’s even a nice little fidget toy.

What it isn’t is a tripod. If you put a three-legged camera stand down on a surface, it might end up at an odd angle, or even fall over if there’s too much height variation between the legs, but it’s not going to wobble. A four-legged stand is going to wobble on any surface that’s not perfectly flat, and is also going to be subject to variations in manufacture.

The legs on my shiny new Modo Pocket are about two sheets of paper off from being perfectly aligned, which means that it can wobble a bit during long exposures. Adjusting it to perfection is trivial, but even once it’s perfectly aligned on perfectly flat surfaces, it won’t be that way out in the real world.

It can’t be, because it has four fixed-length legs. This is a limitation, not a flaw. Just like it’s not designed to work with an SLR and a superzoom (it would fall over in a heartbeat), it’s not designed to replace a tripod. It’s designed to help the camera in your pocket grab a sharp picture quickly, before you lose the chance. I expect to get some very nice, sharp pictures with this gadget, and I don’t regret the $30 in the least.

Make More People!


I’m doing some load-testing for our service, focusing first on the all-important Christmas Morning test: what happens when 50,000 people unwrap their presents, find your product, and try to hook it up. This was a fun one at WebTV, where every year we rented CPUs and memory for our Oracle server, and did a complicated load-balancing dance to support new subscribers while still giving decent response to current ones. [Note: it is remarkably useful to be able to throw your service into database-read-only mode and point groups of hosts at different databases.]

My first problem was deciphering the interface. I’ve never worked with WSDL before, and it turns out that the Perl SOAP::WSDL package has a few quirks related to namespaces in XSD schemas. Specifically, all of the namespaces in the XSD must be declared in the definition section of the WSDL to avoid “unbound prefix” errors, and then you have to write a custom serializer to reinsert the namespaces after wsdl2perl.pl gleefully strips them all out for you.

Once I could register one phony subscriber on the test service, it was time to create thousands of plausible names, addresses, and (most importantly) phone numbers scattered around the US. Census data gave me a thousand popular first and last names, as well as a comprehensive collection of city/state/zip values. Our CCMI database gave me a full set of valid area codes and prefixes for those zips. The only thing I couldn’t find a decent source for was street names; I’m just using a thousand random last names for now.

I’m seeding the random number generator with the product serial number, so that 16728628 will always be Elisa Wallace on W. Westrick Shore in Crenshaw, MS 38621, with a number in the 662 area code.

Over the next few days, I’m going to find out how many new subscribers I can add at a time without killing the servers, as well as how many total they can support without exploding. It should be fun.

Meanwhile, I can report that Preview.app in Mac OS X 10.5.4 cheerfully handles converting a 92,600-page PostScript file into PDF. It took about fifteen minutes, plus a few more to write it back out to disk. I know this because I just generated half a million phony subscribers, and I wanted to download the list to my Sony Reader so I could scan through the output. I know that all have unique phone numbers, but I wanted to see how plausible they look. So far, not bad.

The (updated! yeah!) Sony Reader also handles the 92,600-page PDF file very nicely.

[Update: I should note that the “hook it up” part I’m referring to here is the web-based activation process. The actual “50,000 boxes connect to our servers and start making phone calls” part is something we can predict quite nicely based on the data from the thousands of boxes already in the field.]

Sony Reader firmware update, finally!


[Update: sample picture of a PDF with kanji and furigana below the fold]

Quite a while ago, Sony promised to update their e-ink reader (the 505 model, at least; owners of the original 500 are SOL) to support Adobe Digital Editions (emerging DRM ebook standard), as well as fix a lot of bugs and in general support the product. People have been wondering if it would ever happen, or if it would be a new model. The recent UK release of the 505 was a head-scratcher as well, since it came without any announcement about the overdue update.

It took a while, but it’s here (more precisely, it’s linked from here; there’s no direct download link). Lots of other improvements, including SDHC compatibility and… (wait for it)… kanji in PDF files! You still need to use one of the hacks to see Chinese and Japanese text in text files and menus, but now that there’s a real firmware installer for the 505, you can recover from bad hacks.

Looks good so far.

[Update: the PDF reflow works pretty well for straightforward text-heavy PDFs with sensible internal layout. That is, the order the text was generated in the PDF file is the order it will appear; it doesn’t understand “columns” as such. Unfortunately, the Microsoft Word equation editor violates this constraint, and furigana in Word is implemented as an equation. Net result: Japanese PDFs may turn into crap when you ask the reader to reflow them, so you should format them for its page size.

This also means that graphics-heavy PDF files can’t be resized at all. Maps and complex diagrams must be converted to JPG to be useful, because the PDF viewer still doesn’t scroll, and the resize button is always a reflow button now.

Generally, the UI is much faster (except the date-entry screen, which is glacial), and page-turning is slightly faster. The only EPUB-format document I’ve tried turned out to be very graphics-heavy, which basically locked up the device during rendering. I haven’t tried an SDHC card, but people are reporting very mixed results. I’m loving the kanji support in PDFs, and look forward to trying an updated version of the Unicode font hack to get kanji working in text files as well.]

more...

Ooma goes retail!


We’ve been taking it slow, but we’ve finally got a retail trial running. If you’re in the Los Angeles area, you can finally see our product before buying one, at Best Buy.

"Now throw the switch and let us begin the battle for the planet."
--- The Brain

“Need a clue, take a clue,
 got a clue, leave a clue”