“Remember in the Lion King when Scar cheated to win the title as king? And the pride land was overrun with the hyenas? And all of the lions lost everything they had built and maintained? Just asking. No reason.”

— Donald Trump, Jr

Competitive advantage


Merrill reflects on Ila’s qualifications….

more...

Notes on finishing a novel


A novel in Japanese, that is, converted into a custom “student edition” at precisely my reading level, as described previously.

  1. Speed and comprehension are good; once I resolved the worst typos, parsing errors, and bugs in my scripts, I was able to read at a comfortable pace with only occasional confusion. Words that didn't get looked up correctly are generally isolated and easy to work out from context, and most of the cases where I had to stop and read a sentence several times turned out to be odd because the thing they were describing was odd (such as what the guard does before allowing the original Kino to enter the city in Natural Rights). Of course, it helps to have a general knowledge of the material.
  2. Coliseum was changed significantly for the animated version of Kino's Journey; the original story leaves most of the opponents shallow and one-dimensional, and spends way too much time on the mechanical details of Kino's surprise (both the preparation the night before, and the detailed description of the physical impact and aftermath). Mother's Love, on the other hand, is a pretty straight adaptation.
  3. Casual speech and dialect don't cause as much problem as you might expect. MeCab handles a lot of the common ones, and recovers well from the ones it has to punt on. They didn't confuse me too often, either. After a while. :-)
  4. One thing that MeCab sometimes gets wrong is when a writer uses pre-masu form instead of te-form when listing a series of actions. I don't have a good example at the moment, but I ran into several where it punted and looked for a noun.
  5. The groups that scan, OCR, and proofread novels tend to miss some simple errors where the software guessed the wrong kanji. A good example is writing 兵士 as 兵土 or 兵上. Light novels generally aren't that complicated, and if a word looks rare or out of place, it may well be an OCR error.
  6. The IPA dictionary used by MeCab has some quirks that make it sub-optimal for use with modern fiction. Reading 空く as あく, 他 as た, 一寸 as いっすん, 間 as ま, 縁 as えん, and 身体 as しんたい are all correct sometimes, but not in some common contexts where their Ipadic priority causes MeCab to guess wrong. Worse, it has a number of relatively high-priority entries that are not in any dictionary I've found: 台詞 as だいし, 胡坐 as こざ, 面す and 脱す as verbs that are more common than 面する and 脱する, etc. It also has no entries for みぞれ, 呆ける, 粘度, 街路樹, and a bunch of others. Oddest of all, there are occasions where it reads 達 as いたる; this is a valid name reading, but name+達 is far more likely to be たち than いたる; some quirk of how it calculates the appropriate left/right contexts when evaluating alternatives, an aspect of the dictionary files that I definitely don't understand.
  7. I need to make better use of the original furigana when evaluating MeCab output. I'm preserving it, but not using it to automatically detect some of the above errors. Mostly because I don't want the scripts to become too interactive. Perhaps just flagging the major differences is sufficient.
  8. On to book 2!

Well, at least it's got a catchy title


Can’t go wrong with a title like “Regarding Ducks and Universes”, even when a quick inspection reveals that it’s a first novel published through Amazon’s vaguely-described Encore program.

I’m not recommending it, mind you, and I’m not even using my affiliate code in that link. I just found it interesting that Amazon is aggressively promoting an SF title by a complete unknown, as opposed to the usual “Kindle vanity press” or POD semi-publishing approaches.

How did I miss this?


Donna Barr is putting both Stinz and The Desert Peach online.

Stinz is still in issue 1, before the war, but the Peach is all the way up to issue 21.

Lots of good stuff, but watching The Desert Fox hang ten is still one of my favorite bits.

Dear Recaptcha,


This goes way beyond “not funny”, all the way to “incredibly stupid”. Does someone do even basic quality control on your source images? I’m thinking the answer is a rather firm No.

Recaptcha from Hell

[Update: Just saw one go by where one word was in cyrillic and the other in hebrew; sadly, I clicked refresh before I could stop to grab the screenshot.]

Into every giant robot's life...


…a little Eineus must fall.

more...

Cover story


One of the oddest limitations of the Kindle is that you need to jailbreak it to change the screensaver images. There’s a small set of images supplied by Amazon, some nice, some hideous, and you’re stuck with them. Replacing them is probably the single most common reason for Kindle-hacking.

I could use images from my collection of Naughty Novel Cover Art, but people have a tendency to pick up your Kindle and turn it on, and even limiting the selection to safe-for-work images still leaves it a bit spicy.

So, I went digging through my shelves for Paperbacks That Have Known The Touch Of A Lover. That is, battered old books that someone, not necessarily me, made extensive use of. I quickly assembled a stack about three feet high, and whittled it down to some particularly interesting ones. Boosting the contrast and brightness about 25% before downsampling to 16-color grayscale produces decent results, and I’m sure I’ll expand the collection over time.

Small color versions of the current set below:

more...

Appending metadata to a PDF file


The Kindle has generally excellent support for reading PDF files, but absolutely terrible support for displaying embedded metadata. If FOO5419.pdf contains properly-specified Title and Author fields, it will appear on your Kindle as, you guessed it, FOO5419. It might show the Author on the right-hand side of the screen, and it might show Title and Author on the detail screen, but likely not.

It will work if you generate PDF version 1.3 with a self-contained Info dictionary (that is, “/Title(My Book)”, but not “13 0 obj (My Book) … /Title 13 0 R”). It will work if you do an append-only update to a v1.3 file in Adobe Acrobat Pro. It will work if you do a rewrite of a v1.3 file with pdftk.

What should work, for all PDF files, is an append-only update that uses only v1.3-ish features to create a self-contained Info dictionary. I hadn’t hacked PDF by hand since 1993, but I dusted off my reference manuals and wrote a script that correctly implements the spec.

It doesn’t work on a Kindle. Acrobat sees my data, Mac OS X Preview sees it, pdftk sees it, and every other tool I’ve tried agrees that my script generates valid PDF files with updated metadata. However, if I use my tool and then ask pdftk to convert the append-only update into a rewrite, the Kindle can see it (but only if it started out as v1.3).

I therefore declare their parser busted. The actual PDF viewer works fine, but whatever cheesy hack they’re using to quickly scan for metadata, it ain’t the good cheese.

“Need a clue, take a clue,
 got a clue, leave a clue”