Thursday, October 3 2013

Text acquisition: complete!

[Update: turns out the iPhone hotspot does not route traffic over the iPhone VPN; boo, hiss. I’ll need to rig up a wireless router behind an OpenVPN box to avoid the risk of tripping the region-detection]

The Kindle Paperwhite will not speak to a shared wireless connection hosted on my laptop. It will, however, speak to an iPhone’s built-in hotspot, and the iPhone has a VPN client compatible with HideMyAss VPN, allowing me to pretend that my shiny new second-generation Paperwhite is located in Japan and eligible to purchase and download Kindle books without violating the publishers’ geographic restrictions on ebook licensing.

With that out of the way, the standard DeDRM and KindleUnpack scripts trivially converted my new books into clean XHTML, which required about 60 lines of Perl to massage into input for Yomitori. Diffing the text of Asobi ni Iku Yo book 1 confirmed the accuracy of my new source.

Now if I can figure out the new inline footnote support, I can merge the vocabulary into the main book the way I do for the HTML output, and switch from dual PDFs to a single MOBI file. But that comes later; right now, the complete text of Miniskirt Space Pirates book 1 is waiting on my Kindles, without all the OCR errors I was wading through the last time.