June 2009

Thighs, Ponzu, Spam, Steak, and BiscuitPeachGingerCrack

My sister’s in town for business, so…

No, wait, let me start again.

My lovelytalentedarticulatestylisheducatedsensiblesuccessful sister’s in town for business, and arranged to come in early so we could spend Saturday together in San Francisco, and Sunday down at my house.

Friday, while working from home, I prepared for her visit by lighting up the smoker and preparing a double batch of spicy smoked chicken thighs. I think she’d have disowned me if I’d shown up at the airport without them.

Saturday, I picked her up at SFO and handed over the chicken, then we bummed around Japantown and Chinatown for a few hours (praising the heavens that our mother was not along to see the everything-must-go final-auction-starts-at-noon Chinese antique shop), sat impatiently in the bar for several hours while the hotel prepared our rooms, and then headed out for dinner and Spamalot. Since both hotel and theater were in the theater district (which should be renamed the theater&bum district), all we needed was a good place to eat, and a Zvents search turned up Ponzu, an asian fusion place that has some delicious food. Whatever else you get there, order the kalbi beef and the fried chickpeas, and eat them together. Trust us on this one; we ordered a second helping of the beef to use up the leftover chickpeas.

After that, it was off to Spamalot, which Ticketmaster shamelessly lied about the cast of, but the touring cast was by no means a disappointment. It’s a terrific show, very Python but hip, and I wouldn’t be surprised if it came back to SF for a longer run in the future.

Inexplicably, the rows in front of and behind us emptied out completely at intermission, and we heard one of the groups complaining about John O’Hurley’s inauthentic British accent. In Spamalot. Monthy Python. Farce. They just couldn’t get past it. Either they were season-ticket-holding Serious Theatre Patrons™, or they inhaled a bit too much of the pot smoke that was drifting in from the nearby exit door, and were just friggin’ high.

Sunday morning, it was off to my house, which, for a change, was quite clean in the rooms that weren’t sealed off. More chicken was consumed, and for dinner, giant juicy Costco steaks, coated with rub and tastefully incinerated on my nuclear grill at a safe and comfortable 725°. Served with cheesy toasts and wine, life was good. Also surprisingly grownup-like, with candles and music and a centerpiece and both of our laptops shoved firmly to the side. Not at all like my usual combination of a frozen dinner and a web browser.

Dessert was the fresh peaches she brought from Chicago, sliced, sugared, and milked, on freshly-baked canned biscuits, topped with crushed Shouga Tsumami (aka “Ginger Pinch”, aka “Ginger Crack”, aka “Ohmygodthesearegoodgivememore”).

Das Limpet

When we got down to my house after seeing Spamalot, Nellie wanted to see Holy Grail. As I dug through the DVDs piled on my shelves, she was alternately amused and surprised by the contents of my collection, until I reached the one that turned out to be right above Holy Grail. I showed it to her, and got the reaction I expected: IT MUST BE MINE! (for a brief loan, at least).

When I told the story to Dave at lunch today, his blank stare reminded me how crucial a few years can be when it comes to pop culture. It had simply never occurred to me that someone in my usual circle of friends would never have heard of this:


Engrish Pop Quiz

I saw something at Daiso a while back that I thought would make an amusing gift for my sister. On the back was found this label:

Caution: Engrish In Use

Now, what’s the product?


Whatever happened to Stone Clouds?

Every once in a while, I’d visit the old Radioactive Panda site and see if there was any word on Eric Johnson’s next comic. The answer was always no (in the form of deafening silence, unless you visited the forums), but he has now returned with an official update, revaling a new start date and the reason for his three-year absence: respectively, “August 2009” and “World of Warcraft”.

Yeah, I can see that.

Riddle me this…

So, in a story about a well-placed State Department official on trial for spending the last 30 years spying for Cuba, what sort of direct quote do they lead off with?

"We were all appalled by the Bush years"

Because, y’know, that puts everything in perspective. If proven guilty, what we have here is someone who turned traitor because he started hating America during the Carter administration, but somehow, it’s still all about Bush. Fits the established narrative better, y’see.

Dear Apple,

Why does Mail.app keep segfaulting in this method call:

[MetadataManager getAllCalendarStoreData]

I’ve turned off data detectors, rebuilt my iCal database, rebuilt my Mail indexes, and pretty much everything else I can think of, and it still crashes anywhere between 2 minutes and one hour after I start it up.

Mind you, I have no idea why your email client is importing all of my calendars in the first place…

[Update: various forum posts suggest that this is tied to Leopard’s merger of iCal to-do list functionality into Mail, which works by syncing your local to-do lists up with your IMAP server. Except that I don’t use iCal for to-do lists, and wouldn’t want them on my mail servers if I did. So, a feature I’ve never used that does something I don’t want has inexplicably started causing my email to crash at random intervals, and since the bug has been around since at least 10.5.2, it’s unlikely to be fixed deliberately. One can only hope that there’s enough mail-related cleanup in Snow Leopard that it starts working there…]

The problem with “Letterman’s rape joke”…

…is that he never made one. In all the outraged coverage of the incident, you think someone would have bothered to mention that little nugget of information.

I watched the clip everyone’s linking to. He made a joke that implied that a player for the Yankees (who is good-looking, quite successful, and has been caught fooling around in the past) left the field in the middle of the game and had sex with Sarah Palin’s daughter. The exact words were “her daughter was knocked up by Alex Rodriguez”. Not a single word about rape, and, in fact, one could easily reverse the outrage by pointing out that these people are insisting that sex between a light-skinned female and a dark-skinned male must be rape.

Letterman and his writers obviously thought they were talking about Palin’s adult daughter, an unwed mother who was knocked up by an athlete. They’re guilty of being too lazy to check which daughter attended the game, or perhaps of not even knowing that there was another daughter. But that’s all.

If you happened to know (as Letterman and his equally-clueless writers obviously did not) that the daughter at the game was 14 years old, you could interpret it as a statutory rape joke, but I haven’t seen anyone say that. Unless something’s been edited out of the clip (and I got it from a site that was feeling the rage), there’s just no rape in this “rape joke”.

Dear Apple,

When you mark a bug “closed as a duplicate of bug X”, it would be nice if I were able to actually see bug X. Apparently, if I want to know the status of a fix, I have to send email saying “I reported Y, can you tell me if there’s been any action on X?”.

In this case, X has ID 5647954 and Y is 6770720, suggesting that X has been gathering dust for quite a while, and is unlikely to be fixed in an upcoming release. Japanese keyboard support in general seems to be pretty dusty, and I doubt you’ll get them working with Boot Camp any time soon, either.

[yes, this is because XP and Vista are stupid about keyboard layouts, and it affects VMware, too, but so what? You wrote the drivers for Boot Camp, and tout it as a feature, and it doesn’t work with some of your keyboards.]

How to sell comic books…

…in Japan. Not precisely safe for work…


Abbyy FineReader Pro 9.0, quick tests

I’ve gotten pretty good at transcribing Japanese stories and articles, using my DS Lite and Kanji sonomama to figure out unfamiliar kanji words, but it’s still a slow, error-prone process that can send me on half-hour detours to figure out a name or obsolete character. So, after googling around for a while, I downloaded the free 15-day demo of FineReader Pro and took it for a spin. Sadly, this is Windows software, so I had to run it in a VMware session; the only product that claims to have a kanji-capable Mac product has terrible reviews and shows no sign of recent updates.

First test: I picked up a book (Nishimura’s murder mystery collection Ame no naka ni shinu), scanned a two-page spread at 600-dpi grayscale, and imported it into FineReader. I had to shut off the auto-analysis features, turn on page-splitting, and tell it the text was Japanese. It then correctly located the two vertically-written pages and the horizontally-written header, deskewed the columns (neither page was straight), recognized the text, and exported to Word. Then I found the option to have it mark suspect characters in the output, and exported to Word again. :-)

Results? Out of 901 total characters, there were 10 errors: 6 cases of っ as つ, one あ as ぁ, one 「 as ー, one 呟 as 眩, and one 駆 recognized as 蚯. There were also two extra “.” inserted due to marks on the page, and a few places where text was randomly interpreted as boldface. Both of the actual kanji errors were flagged as suspect, so they were easy to find, and the small-tsu error is so common that you might as well check all large-tsu in the text (in this case, the correct count should have been 28 っ and 4 つ). It also managed to locate and correctly recognize 3 of the 9 instances of furigana in the scan, ignoring the others.

I’d already typed in that particular section, so I diffed mine against theirs until I had found every error. In addition to FineReader’s ten errors, I found two of mine, where I’d accepted the wrong kanji conversion for words. They were valid kanji for those words, but not the correct ones, and multiple proofreadings hadn’t caught them.

The second test was a PDF containing scanned pages from another book, whose title might be loosely translated as “My Youth with Ultraman”, by the actress who played the female team member in the original series. I’d started with 600-dpi scans, carefully tweaked the contrast until they printed cleanly, then used Mac OS X Preview to convert them to a PDF. It apparently downsampled them to something like 243 dpi, but FineReader was still able to successfully recognize the text, with similar accuracy. Once again, the most common error was small-tsu, the kanji errors were flagged as suspect, and the others were easy to find.

For amusement, I tried Adobe Acrobat Pro 9.1’s language-aware OCR on the same PDF. It claimed success and looked good on-screen, but every attempt to export the results produced complete garbage.

Both tests were nearly best-case scenarios, with clean scans, simple layouts, and modern fonts at a reasonable size. I intend to throw some more difficult material at it before the trial expires, but I’m pretty impressed. Overall, the accuracy was 98.9%, but when you exclude the small-tsu error, it rises to 99.6%, and approaches 99.9% when you just count actual kanji errors.

List price is $400, but there’s a competitive upgrade available for customers with a valid license for any OCR software for $180. Since basically every scanner sold comes with low-quality OCR software, there’s no reason for most people to spend the extra $220. They use an activation scheme to prevent multiple installs, but it works flawlessly in a VMware session, so even if I didn’t own a Mac, that’s how I’d install it.

This is just… sad.

Not so much because there’s a cartoonist who’d draw it, as because I’m seeing it linked approvingly by people who have access to wet matches, with which they can obviously no longer be trusted.

Samurai Tut

Need something to do in San Francisco?

  • King Tut, de Young Museum, June 27, 2009 through March 28, 2010
  • Lords of the Samurai, Asian Art Museum, June 12 through September 20, 2009

Microsoft Bluetooth Notebook Mouse 5000 review, Bad Haiku Edition

Button 2 broke fast;
replacement eats batteries.
Gosh it's pretty, though.

Using Abbyy FineReader Pro for Japanese OCR

[Update: if you save your work in the Finereader-specific format, then changes you make after that point will automatically be saved when you exit the application; this is not clear from the documentation, and while it’s usually what you want, it may lead to unpleasant surprises if you decide to abandon changes made during that session.]

After several days with the free demo, in which I tested it with sources of varying quality and tinkered with the options, I bought a license for FineReader Pro 9.0 (at the competitive upgrade price for anyone who owns any OCR product). I then spent a merry evening working through an album where the liner notes were printed at various angles on a colored, patterned background. Comments follow:

  • Turn off all the auto features when working with Japanese text.
  • In the advanced options, disable all target fonts for font-matching except MS Mincho and Times New Roman. Don't let it export as MS Gothic; you'll never find all of the ー/一 errors.
  • Get the cleanest 600-dpi scan you can. This is sufficient for furigana-sized text on a white background.
  • Set the target language to Japanese-only if your source is noisy or you're sure there's no random English in the text. Otherwise, it's safe to leave English turned on.
  • Manually split and deskew pages if the separation isn't clean in the scan.
  • Adjust the apparent resolution of scans to set the output font size, before you tell it to recognize the text.
  • Manually draw recognition areas if there's anything unusual about your layout.
  • Rearrange the windows to put the scan and the recognized text side-by-side.
  • Don't bother with the spell-checker; it offers plausible alternative characters based on shape, but if the correct choice isn't there, you have to correct it in the main window anyway. Just right-click as you work through the document to see the same data in context.
  • You can explicitly save in a FineReader-specific format that preserves the entire state of your work, but it creates a new bundle each time, and it won't overwrite an existing one with the same name. This makes it very annoying when you want to simply save your progress as you work through a long document; each new save includes a complete copy of the scans, which adds up fast.
  • If you figure out how to get it stop deleting every full-width kanji whitespace character, let me know; it's damned annoying when you're trying to preserve the layout of a song.
  • Once you've told it to recognize the text, search the entire document for these common errors:
    • っ interpreted as つ and vice-versa
    • ー interpreted as 一 and vice-versa; check all other nearby katakana for "small-x as x" errrors while you're at it
    • 日 interpreted as 曰
    • Any English-style punctuation other than "!", ":", "…", or "?"; most likely, they should be the katakana center-dot, but it might have torn a character apart into random fragments (rare, unless your background is noisy).
    • The digits 0-9; if your source is noisy, random kanji and kana can be interpreted as digits, even when English recognition is disabled.
  • Delete any furigana it happens to recognize, unless you're exporting to PDF; it just makes a mess in Word.
  • In general, export to Word as Formatted Text, with the "Keep line breaks" and "Highlight uncertain characters" options turned on.
  • If your text is on a halftoned background and you're getting a lot of errors, load up the scan in Photoshop, use the Strong Contrast setting in Curves, then try out the various settings under Black & White until you find one that gets rid of most of the remaining halftone dots (I had good luck with Neutral Density). After that, you can Despeckle to get rid of most of the remaining noise, and use Curves again to force the text to a solid black.

How to make an old cellphone sexy…

Hand it to Han Ga Eun (한가은)…


You’re doing it wrong…

I mean, come on:

“Dear Slashdot, how do I gain social skills?”

Words of Wisdom…

…sort of. So speaks Izumi Kojima, in the song “Ah ~ yokatta”:

There is nothing for us to lose.
Sure, I can say. I can say.
Nobody knows what it means,
"Hung in there!"
But I'll be right beside you from now on.
So on...

“Need a clue, take a clue,
 got a clue, leave a clue”