“The only thing flat earthers have to fear… is sphere itself.”
— Truth in punning…the black cable controls fan speed. I’ll need this information again soon.
I’ve made quite a few improvements since putting the code up on Github. Just having it out in public made me clean it up a lot, but trying to produce a decent sample made an even bigger difference. QAing the output of my scripts has smoked out a number of errors in the original texts, as well as some interesting errors and inconsistencies in Unidic and JMdict.
The sample I chose to include in the Github repo is a short story we went through in my group reading class, Lafcadio Hearns’ Houmuraretaru Himitsu (“A Dead Secret”). The PDF files (text and vocab) are designed to be read side-by-side (I use two Kindles, but they fit nicely on most laptop screens), while the HTML version uses jQuery-based tooltips to show the vocabulary on mouseover.
For use as a sample, I left in a lot of words that I know. If I were generating this story for myself, I’d use a much larger known-words list.
"If you don't have the social skills to phrase a polite question, Slashdot is perhaps not the ideal place to go looking for advice..."
(via, where the person quoted is actually answering the wrong question…)
About two and a half years ago, I threw together a set of Perl scripts that converted Japanese novels into nicely-formatted custom student editions with additional furigana and per-page vocabulary lists. I said I’d release it, but the code was pretty raw, the setup required hacking at various packages in ways I only half-remembered, and the output had some quirks. It was good enough for me to read nearly two dozen novels with decent comprehension, but not good enough to share.
When I ran out of AsoIku novels to read, I decided it was time to start over. I set fire to my toolchain, kept only snippets of the old code, and made it work without hacking on anyone else’s packages. Along the way, I switched to a much better parsing dictionary, significantly improved lookup of phrases and expressions, and made the process Unicode-clean from start to finish, with no odd detours through S-JIS.
Still some work to do (including that funny little thing called “documentation”…), but it makes much better books than the old one, and there are only a few old terrors left in the code. So now I’m sharing it.
https://github.com/jgreely/yomitori
One kilo of pure Sucralose powder, for ~$200.
This is either a lifetime supply, or a lifetime supply, much like the kilo of pure caffeine, which is about a hundred lethal doses.
Don’t get so distracted by Anna Konno’s hotness that you forget about the laws of physics.

(via the very, very NSFW Gazou Navi)