February 2011

One from the trenches...

"I'm going to write out a log entry every time I see this sort of packet, and put it at WARNING level. This will help me solve a serious problem!"
    -- Anonymous Developer

1,000 packets/second later on N devices…

Barsoomian Haiku

Evil men often
held Dejah Thoris for weeks;
did they get any?

Warlord John Carter,
always present when villains
say “As you know, Bob”.

Barsoom’s nude beauties:
inadequately described,
yet worth dying for.

Design is not engineering, lesson 692

Words fail me. Windmills. Solar panels. Greenhouses. Only people with advanced degrees could come up with such a stupid bridge design.

Eco-Bridge of Doom

(via Gizmodo, whose writer seems to be about as technically adept as the designers themselves)

Y'know, just in case


Now I just need an appropriate illustration…

Spoiling my laptop, and myself

A while back, I upgraded my laptop by replacing the DVD with a 240GB SSD. This has been very, very nice, and gave me just shy of 750GB of disk space, a third of it silly-fast.

So naturally I couldn’t resist replacing the 500GB Seagate hybrid with a Western Digital 750 GB 7200rpm drive, giving me just shy of a Terabyte. And I carry another Terabyte around in the form of a WD hardware-encrypted USB drive.

If this future we live in had flying cars and catgirls, it would be perfect.

Amusingly, despite the fact that this laptop (and its daily backups…) is the center of my electronic universe, I will likely not be taking it to Japan with me at the end of March. My sister and I are only going to be in Kyoto for a week, and time spent in the hotel is time wasted. I’ll take my little Win7 netbook (can VPN to work in an emergency) and my Kindle (v3 has kanji support, and 3G Whispernet works all over Japan), but the bulk of the weight in my carry-on will consist of cameras and lenses.

The Kindle is the reason for several of my seemingly-unrelated recent entries and sidebar links, by the way, including an upcoming discussion of my grand kit-bashing project that mixes Aozora Bunko, MeCab, JMdict, MongoDB, pLaTeX, dviasm, and pdftk, welded together with a few hundred lines of Perl to produce ebooks with personalized levels of furigana and matching per-page vocabulary lists. More on that soon.

In addition to Aozora’s out-of-copyright literature, it’s easy to find much more contemporary work marked up in their format. One should of course only download such things if one is already in legitimate possession of the printed book, but once that hurdle is cleared, my version will be much easier to work through. The scripts can take a complete light novel from raw text to completed PDFs in about 10 seconds, and it only takes a few passes to find all of the unusual vocabulary and definitions, so my reading speed will be improving quite a bit soon.

Which is good, because, as I said, I’m finally going back to Japan!

I am intrigued by her ideas...

…and wish to subscribe to her newsletter.

Ootani Masae: buy my singles or else!

In addition to releasing indie singles as “Himawari”, Masae Ootani has been getting some decent theatre roles recently. This one looks like it makes good use of her distinctive style. Honestly, except for the sword, it’s like she just walked out of her apartment. Maybe she leaves it at home; Tokyo cops are so sensitive about that sort of thing.

Microsoft Arc Touch Mouse

This thing. Travel version of the Arc Mouse. Replaces middle button/wheel with solid-state slide control that includes scroll, page up/down, and middle-click.

Except I lied there. It doesn’t actually support middle click. In his infinite wisdom, designer Young Kim made a click at the top of the strip, where your finger naturally falls, send a page-up keystroke. Clicking at the bottom of the strip sends a page-down.

Clicking the middle of the strip does nothing at all. You double-click the middle of the strip to generate a single middle-click. Middle-click-and-hold activates the annoying drag-scroll mode that I’ve never seen anyone use deliberately. Usually they end up trying to figure out why their mouse stopped working normally.

And why do I know the designer’s name? Because the only two things on the product support web site are an interview with him and a “lifestyle video”.

And why did I buy one? Because the last several MS mice I’ve bought had poorly-engineered scrollwheels that simply stopped working after a while, and I thought the solid-state version might be a step up. It looked nice at the company store, and didn’t suffer from the same heavy-spring problem that the right mouse button on the standard Arc has.

So, if you’re one of those two-buttons-is-enough Windows people, and you don’t mind risking the loss of the little USB dongle (held in place on the completely-flat underside of the mouse by a strong magnet), it looks like an excellent lifestyle accessory, and a decent mouse.

Dear Microsoft,

I replaced my secondary hard drive over the weekend. Today I discover that Microsoft Office 2011 is demanding an activation key. Not “you need to go online to reactivate”, but rather “you can no longer use this product until you drive home, find the box, and re-enter the key”.

Permit me to describe my feelings about this.

I’ll keep it simple.


Defense Against The Dog Arts

Unrelated: A shrine maiden, a buddhist nun, and a “catholic” nun walk into a bar, and…

No, wait, that’s not a bar, it’s a porn novel. My mistake.

Dear Amazon,

Because I bought a Kindle, your recommendation system now fills the first several pages of results with random ebooks that either I already own in print, or else are not even plausibly related to anything I’ve ever purchased, owned, or searched for.

"Hey, you need to fill your Kindle with books! This is a book! It has a cover and a title page and words inside! You like words, right? Of course you do!"

Here’s one of the least stupid suggestions:

Things that go bump in your kitchen

Because I bought Cooks Illustrated’s Italian Favorites, I really, really want to read about a monster-hunter who’s in over her head.

Other Kindle-fied “recommendations” include one called Blink, subtitled “The power of thinking without thinking”, because I own Beard on Food.

I can’t blame it all on the Kindle, though; that 750GB laptop drive I just bought led to a recommendation for a Gillette single-blade disposable razor. And there are some actual relevant recommendations, such as Shogun because I bought Exploring Kyoto, and James Beard’s New Fish Cookery because of the aforementioned Beard On Food.

And I really can’t complain about the DVD of Xanadu, recommended because my wishlist contains the Flash Gordon Blu-ray release. That’s just common sense.

Dear Google,

Auto-correcting my already-correct spelling of a search (even when I’m refining a search by adding additional keywords after already overriding your miscorrection) is annoying, but auto-correcting it to something that you have no search results for at all? STUPID.

The most reliable miscorrection I’ve found is the Japanese version of the LaTeX document processing system, which goes by the name “pLaTeX”. This is always corrected to “playtex”, even if the other words in the search are so specific that there are no Playtex associations possible, such as 傍点, the marginal dots that are used for emphasis in Japanese text.

Searches for LaTeX often include rubber gloves and fetish items, but the TeX community has been online for so long that they don’t dominate. In fact, I suspect horny rubber-lovers are often frustrated to find themselves receiving advice on pagination, hyphenation, and “how to make your font bigger”.


I am compelled to make the following observations about the first novel in the Asobi ni iku yo! series.

  1. The series title is given a wonderful Engrish translation as furigana: “Us It goes to play in Your house”.

  2. The compound noun 食料合成機 (literally “food synthesis machine”) has the following pronunciation as furigana: ソイレント・グリーン.

For the kana-impaired, instead of shokuryou-gouseiki, it’s to be read as soirento gureen. Not having seen the anime (yet), I do not know if this joke was carried over.

Parsing Japanese with MeCab

This is a public braindump, to help out anyone who might want to parse Japanese text without being sufficiently fluent in Japanese to read technical documentation. Our weapon of choice will be the morphological analysis tool MeCab, and for sanity’s sake, we’ll do everything in UTF8-encoded Unicode.

The goal will be to take a plain text file containing Japanese text, one paragraph per line, and extract every word in its dictionary form, with the correct reading, with little or no manual intervention.


Frog not included

Gelaskins makes very nice, colorful skins (with high-quality 3M materials) for a wide assortment of gadgets, and if they don’t currently support yours, you can measure it and make a custom order for the same price. I liked the idea, I just didn’t really like any of the pictures. Fortunately, they let you upload your own, or use any image that you have rights to.

I’ve taken a few photos that I’m quite happy with, so I went searching through my Aperture archives to find one that would work. To my surprise, the image I ended up liking the most was a test shot.

Right after I bought my latest camera body, the Sony Alpha 850, I went up to San Francisco to spend a day trying it out at the Asian Art Museum and Golden Gate Park.

I spent a lot of time fiddling with the silliest and most glorious lens Minolta ever made, the 135mm STF, which I was delighted to dust off after several years where its long focal length made it difficult to use with APS-sized sensors. Out at the park, though, I pulled out an old standby, the long-discontinued Minolta 70-210mm f4. This is a consumer-grade lens, but made back when that meant “serious amateur”, not “cheapest plastic crap we can throw into a bundle”. It’s a terrific walkaround lens, despite its striking resemblance to a 24-ounce canned beverage.

One of the photos I shot while walking around the Japanese Tea Garden in the park jumped out at me as perfect for a laptop skin and wallpaper. [210mm, f4, 1/250, ISO 640]

Frog not included

"That's not the way mommy tells it!"

“Shut up, kid; that’s the way I tell it.”

There’s a fresh manga adaptation of the original Dirty Pair SF novels running in Japan (via The Leaning Tower of Damocles). I will cheerfully confess that I didn’t like the illustration style used for the novels, but I’m not sure this is an improvement. I’ll take the Eighties anime & comic versions, please; these are a bit over the top.


PDF metadata on Kindle

PDF version 1.5 doesn’t work for metadata (apparently because it compresses objects to reduce the output size); save as 1.3 for it to be parsed correctly, and you’ll still need to set the filename to the title you want displayed in the main book listing, even though the device actually parses it out of the file to display on the detail page. Blech.

You can insert the metadata with pdftk as per bloovis, or some other tools (the full version of Adobe Acrobat works great, but is not exactly free…). LaTeX users can use a sledgehammer to swat this fly with the hyperref package, but you’ll need to use dvipdfmx -V3 to downrev the PDF output to 1.3.

Sony got their PDF software from Adobe (for the DRM, mostly), so their Readers don’t have this problem. Sadly, this means that a file generated for the Kindle will display much slower on the Sony, since the object-compression is quite useful.

Bridging The Gulf of Kanji

Imagine that you’re reading something in a foreign language that you’ve been studying for a while, and you hit an unfamiliar word. You know how to pronounce it, so you can often tell if it’s a place or a person’s name, and you’re pretty sure how many words you’re looking at, so if you need to look them up, you can.

When studying Japanese, the most frustrating thing about trying to graduate from reading “student material” to “real stuff” is not being able to do that. You’re reading along, feeling pretty good about yourself, and you run smack into a wall of kanji. Maybe it’s someone’s name, maybe it’s a city you’d recognize if you could pronounce it, or maybe it’s something like 厳重機密保持体制.

Taken individually, you know most or all of the characters, but together, wtf? Is it safe to skip over and work out from context, or do you need to carefully look up each character, crossing your fingers and hoping that it’s a straightforward collection of two-character nouns (which it is, by the way; literally “strictly-classified-preservation-system”, or, more loosely, “seriously top-secret”). Every time you stop to look something like this up, you lose continuity, and instead of reading, you’re deciphering.

I am far from the first person to notice this, and there are some well-developed tools for helping you read a Japanese web site, of which perhaps the best-known is Rikai. I don’t use it. I will occasionally use the built-in pop-up J-E dictionary on my Mac, which is a simpler version of the same thing, but what I really want to do is read books and short stories.


Visit Beautiful Sidzd!

I have recently developed some sympathy for 19th-Century cartographers. In 1815, Japan was a rather mysterious place, and making a detailed map with romanized placenames can’t have been easy.

Japan in 1815

Sidzd was apparently Settsu Province.

Map detail extracted from this item at the Library of Congress’ American Memory site. (you’ll need JPEG2000 and MrSID decoders to work with the largest available images; yes, they chose encumbered, poorly-supported formats to store everything…)

Dear Amazon,

Presenting some random guy’s download from Project Gutenberg as if it were the Kindle edition of an in-copyright book by a respected translator is not acceptable. I can’t safely order anything from your listing of Jay Rubin’s translation of Rashomon and Seventeen Other Stories, because there are clearly several different books here. Even the featured editorial review refers to someone else’s translation. Worse, the page changes slightly each time I reload it, apparently due to different indexing on different servers.

If I ignore the official link from his author page, and search by his name or by its ISBN, I find the real book here. But if I trust your listings, I’m screwed.

(yes, I sent in feedback; hopefully someone will clean up the mess)

Someone really didn't like Gakuen Kino...

In Kino’s Journeys, the title character is a teenage girl who travels the wilderness on her talking motorcycle, stopping only briefly in each isolated city-state she finds, observing life while reserving judgement, surviving unpleasant encounters using her wits and pistols.

Gakuen Kino is a parody spin-off, featuring magical girl Kino and her talking cellphone strap, fighting monsters in a not-so-ordinary high school.

I’ve read several of the Kino stories, and have finished about 33% 68% 98% of the first novel, but I found the mere existence of Gakuen Kino so amusing that I bought it on sight, and hope to read it at some point. Sadly, while it has been scanned in, no OCR’d, proofread edition is available, so I can’t run it through my scripts to speed up the reading experience. It will have to wait.

I grabbed the scans to get the interior illustrations, but I noticed something a bit unusual about them. The zip file correctly lists it as 学園キノ by 時雨沢恵一, but when you unpack it, the directory claims to contain 面白くないキノ by オナニ沢ケーイチ.

For the kana-deprived, the title has been changed to Omoshirokunai Kino (Boring Kino), by Onani-sawa Keiichi instead of Sigusawa Keiichi. Onani means masturbation. The scans match my copy of the book, so it’s just editorial commentary rather than vandalism, but still a bit of a surprise.

Anyway, here’s one of the color plates from inside the book. No onani, please!


Automagic JIS/ShiftJIS/EUC to UTF8

Finally got sick of constantly dealing with the variety of encoding schemes used for Japanese text files. I still convert everything to UTF-8 before any serious use, but for just looking at a random downloaded file, I wanted to eliminate a step.

less supports input filters with the LESSOPEN environment variable, but you need something to put into it. Turns out the Perl Encode::Guess module works nicely for this, and now I no longer care if a file is JIS, ShiftJIS, CP932, EUC-JP, or UTF-8. Code below the fold.


Duck Tours

No, seriously. I suspect we may have to try out the Osaka version.

Appending metadata to a PDF file

The Kindle has generally excellent support for reading PDF files, but absolutely terrible support for displaying embedded metadata. If FOO5419.pdf contains properly-specified Title and Author fields, it will appear on your Kindle as, you guessed it, FOO5419. It might show the Author on the right-hand side of the screen, and it might show Title and Author on the detail screen, but likely not.

It will work if you generate PDF version 1.3 with a self-contained Info dictionary (that is, “/Title(My Book)”, but not “13 0 obj (My Book) … /Title 13 0 R”). It will work if you do an append-only update to a v1.3 file in Adobe Acrobat Pro. It will work if you do a rewrite of a v1.3 file with pdftk.

What should work, for all PDF files, is an append-only update that uses only v1.3-ish features to create a self-contained Info dictionary. I hadn’t hacked PDF by hand since 1993, but I dusted off my reference manuals and wrote a script that correctly implements the spec.

It doesn’t work on a Kindle. Acrobat sees my data, Mac OS X Preview sees it, pdftk sees it, and every other tool I’ve tried agrees that my script generates valid PDF files with updated metadata. However, if I use my tool and then ask pdftk to convert the append-only update into a rewrite, the Kindle can see it (but only if it started out as v1.3).

I therefore declare their parser busted. The actual PDF viewer works fine, but whatever cheesy hack they’re using to quickly scan for metadata, it ain’t the good cheese.

Cover story

One of the oddest limitations of the Kindle is that you need to jailbreak it to change the screensaver images. There’s a small set of images supplied by Amazon, some nice, some hideous, and you’re stuck with them. Replacing them is probably the single most common reason for Kindle-hacking.

I could use images from my collection of Naughty Novel Cover Art, but people have a tendency to pick up your Kindle and turn it on, and even limiting the selection to safe-for-work images still leaves it a bit spicy.

So, I went digging through my shelves for Paperbacks That Have Known The Touch Of A Lover. That is, battered old books that someone, not necessarily me, made extensive use of. I quickly assembled a stack about three feet high, and whittled it down to some particularly interesting ones. Boosting the contrast and brightness about 25% before downsampling to 16-color grayscale produces decent results, and I’m sure I’ll expand the collection over time.

Small color versions of the current set below:


Into every giant robot's life...

…a little Eineus must fall.


Dear Recaptcha,

This goes way beyond “not funny”, all the way to “incredibly stupid”. Does someone do even basic quality control on your source images? I’m thinking the answer is a rather firm No.

Recaptcha from Hell

[Update: Just saw one go by where one word was in cyrillic and the other in hebrew; sadly, I clicked refresh before I could stop to grab the screenshot.]

“Need a clue, take a clue,
 got a clue, leave a clue”