Tools

Goofy Mecab error


When I ran the third Louie book through my custom-reader scripts (being nearly halfway through book 2…), it warned me about a conjugation pattern it didn’t know how to handle. This happens occasionally, since my de-conjugator is based on a limited sample of Mecab output, but the word it was complaining about was a real surprise: the yodan verb 戦ふ (written “tatakafu”, but pronounced “tatakau”), conjugated into 戦はない.

The sentence was “人の死なない戦はない”, which should be read as “Hito no shinanai ikusa wa nai”. For some reason, the context matcher did not correctly determine that “人の死なない” was a clause modifying the noun “戦”, and instead fell back all the way to a pre-1946 classical conjugation of the modern verb 戦う, which would have translated into the nonsensical “person’s won’t die won’t fight”. One of the many reasons human translators still have jobs!

(the sentence actually means “this is not a battle in which no one dies”, or perhaps “there are no wars where no one dies”; I’ll have to look at the context when I get there)

Tokyo Surfing


[Note: this is one of those “braindump so I don’t miss a step when I tell someone how to do it” posts]

Let's say that you've come across a web site that refuses to serve up its content to people located outside of a certain geographical region. For instance, "Japan" (or UK for BBC streams, etc).

There are two basic ways to go about this: pointing your web browser at an HTTP/HTTPS proxy service that's located in Japan, or opening a VPN connection to a server in Japan. I chose the second method, in part because it isn't limited to web traffic (allowing you to do things like bypass your ISP's outgoing SMTP blocking), and in part because I already knew how.

My weapons of choice were Amazon EC2, OpenVPN (free Community Edition, easy-rsa, OpenVPN GUI for Windows, and Tunnelblick for Mac), and DynDNS plus ddclient.

more...

Car update


So the Camry Hybrid crossed 6,000 miles yesterday, just in time for me to drop it off for its first service. Average mileage over that period settled down to a pleasant 38.2 miles/gallon on Regular. My only complaint at the moment is that when the service-me-now timer goes off, the convenient in-dash display of range, mileage, etc, is overridden; you can get it back for a few seconds, and scroll through the different displays, but it always reverts to MAINT REQUIRED. I could find no way to reset the timer; I could add a dozen categories of new timers, but not clear one that’s already gone off.

For amusement, while I was waiting at the dealership, I sat behind the wheel of a Prius 4-door hatchback. Well, the idea was amusing, anyway; the actual experience was distinctly uncomfortable. Nice storage space with the rear seats down, though.

Lego that book!


But what’s it for?

more...

Appending metadata to a PDF file


The Kindle has generally excellent support for reading PDF files, but absolutely terrible support for displaying embedded metadata. If FOO5419.pdf contains properly-specified Title and Author fields, it will appear on your Kindle as, you guessed it, FOO5419. It might show the Author on the right-hand side of the screen, and it might show Title and Author on the detail screen, but likely not.

It will work if you generate PDF version 1.3 with a self-contained Info dictionary (that is, “/Title(My Book)”, but not “13 0 obj (My Book) … /Title 13 0 R”). It will work if you do an append-only update to a v1.3 file in Adobe Acrobat Pro. It will work if you do a rewrite of a v1.3 file with pdftk.

What should work, for all PDF files, is an append-only update that uses only v1.3-ish features to create a self-contained Info dictionary. I hadn’t hacked PDF by hand since 1993, but I dusted off my reference manuals and wrote a script that correctly implements the spec.

It doesn’t work on a Kindle. Acrobat sees my data, Mac OS X Preview sees it, pdftk sees it, and every other tool I’ve tried agrees that my script generates valid PDF files with updated metadata. However, if I use my tool and then ask pdftk to convert the append-only update into a rewrite, the Kindle can see it (but only if it started out as v1.3).

I therefore declare their parser busted. The actual PDF viewer works fine, but whatever cheesy hack they’re using to quickly scan for metadata, it ain’t the good cheese.

Automagic JIS/ShiftJIS/EUC to UTF8


Finally got sick of constantly dealing with the variety of encoding schemes used for Japanese text files. I still convert everything to UTF-8 before any serious use, but for just looking at a random downloaded file, I wanted to eliminate a step.

less supports input filters with the LESSOPEN environment variable, but you need something to put into it. Turns out the Perl Encode::Guess module works nicely for this, and now I no longer care if a file is JIS, ShiftJIS, CP932, EUC-JP, or UTF-8. Code below the fold.

more...

PDF metadata on Kindle


PDF version 1.5 doesn’t work for metadata (apparently because it compresses objects to reduce the output size); save as 1.3 for it to be parsed correctly, and you’ll still need to set the filename to the title you want displayed in the main book listing, even though the device actually parses it out of the file to display on the detail page. Blech.

You can insert the metadata with pdftk as per bloovis, or some other tools (the full version of Adobe Acrobat works great, but is not exactly free…). LaTeX users can use a sledgehammer to swat this fly with the hyperref package, but you’ll need to use dvipdfmx -V3 to downrev the PDF output to 1.3.

Sony got their PDF software from Adobe (for the DRM, mostly), so their Readers don’t have this problem. Sadly, this means that a file generated for the Kindle will display much slower on the Sony, since the object-compression is quite useful.

"Okay, is the light red or green?"


Many years ago, I was working setup at a trade show, and the network guy asked me to run down to the other end of the conference hall and check out a piece of equipment for him. When I got there, he called me on the radio and asked me what color the blinkenlights were.

“Oh, you didn’t know I’m partially color-blind.”

Today, Dan Kaminsky has released a new iPhone/Android app that does real-time color filtering to allow you to compensate for these problems. I don’t have any compatible devices at the moment, but they seem to have matured enough that it will be worth buying one soon, and this will be a must-buy app.

“Need a clue, take a clue,
 got a clue, leave a clue”