“Some people, when confronted with an integer overflow, think ‘I know, I’ll use a double’. Now they have 2.000000000001 problems.”

— Magical Terrapin Andii rounds down

Gatchaman Crowds question...


Will more cosplayers choose LOAD-kun or Utsutsu-chan?

Gloomy and Trap

(setting aside the issue of being arrested for faithful Utsutsu-chan cosplay…)

Yeah, that'll show ’em


Lowell Observatory's Putnam wants to name asteroid for Trayvon Martin

The victims of media-instigated “justice for trayvon” attacks will be required to time-share a small rock tentatively named “Fuck You Whitey”.

Child 1, SWAT 0


Why was a SWAT officer showing off tactical gear at a children’s book fair in the first place? And why was a small child able to walk up to him, reach into his holster, and fire his sidearm?

I love the quote from the department: “The gun functioned how it was supposed to. When the trigger was pulled, the gun went off.”

Great, guys, now explain why the officer didn’t function as he was supposed to. He’s got a round in the chamber of his Glock at an event where he’s surrounded by small children, deliberately getting them interested in his gear. And he’s wearing it in a “tactical” thigh holster that’s so poorly fitted that the kid managed to get his finger squarely on the trigger without Officer Friendly noticing.

Name that warrior...


“If it weren’t for her plump breasts, most opponents would probably mistake her for a man. Therefore she dressed daringly, exposing most of her flesh. Her skin was swarthy, with a strange design inked on her left cheek. An informed observer would recognize the design as a warding spell of the Arido hill tribes.”

That settles the question of why competent women in fantasy show so much skin: they’re feminist pioneers, working twice as hard to prove they’re just as good as men!

[this message brought to you by my attempt to figure out the word 呪払い, which apparently should be read as “noroibarai”; it’s not explained anywhere, but is in common use in the online fantasy community to refer to warding magic. See also here.]

"Shut up and take my money"


Jeff Atwood and Weyman Kwong are making a sturdy programmer’s keyboard with silent mechanical keyswitches. While I enjoy the ear-shattering clatter of my current mechanical keyboad, I’m less fond of the shoddy physical construction and poor multi-keypress handling (and, of course, I’d swallow broken glass before dealing with the assholes at Matias ever again), so this is definitely on my must-buy list.

Thinking bad thoughts


Perhaps I’ve been away from teenage girls for too long, but my automatic reaction to this product suggests I should be kept away from them…

Duct tape? At their age?

Why did Bradley Manning suddenly become Chelsea?


Why make a big fuss about announcing what everyone knew well before the trial started? Because Leavenworth is an all-male prison. This makes it look less like a courageous stance by a transgender individual, and more like a cynical ploy to avoid spending the next 7-35 years in Leavenworth.

(cynical quotes around certain words in the previous paragraph have been omitted to avoid discussing the general issue of gender as a fluid concept disconnected from biology)

Yomitori expressions


Mecab/Unidic has fairly strict ideas about morphemes. For instance, in the older Ipadic morphological dictionary,日本語 is one unit, “Nihongo = Japanese language”, while in Unidic, it’s 日本+語, “Nippon + go = Japan + language”. This has some advantages, but there are two significant disadvantages, both related to trying to look up the results in the JMdict J-E dictionary.

First, I frequently need to concatenate N adjacent morphemes before looking them up, because the resulting lexeme may have a different meaning. For instance, Unidic considers the common expression 久しぶりに to consist of the adjective 久しい, the suffix 振り, and the particle に. It also thinks that the noun 日曜日 is a compound of the two nouns 日曜 and 日, neglecting the consonant shift on the second one.

Second, there are a fair number of cases where Unidic considers something a morpheme that JMdict doesn’t even consider a distinct word. For instance, Unidic has 蹌踉き出る as a single morpheme, while JMdict and every other dictionary I’ve got considers it to be the verb 蹌踉めく (to stagger) plus the auxiliary verb 出る (to come out). The meaning is obvious if you know what 蹌踉めく means, but I can’t automatically break 蹌踉き出る into two morphemes and look them up separately, because Unidic thinks it’s only one.

To fix the second problem, I’m going to need to add a bit of code to the end of my lookup routines that says, “if I still haven’t found a meaning for this word, and it’s a verb, and it ends in a common auxiliary verb, then strip off the auxiliary, try to de-conjugate it, and run it through Mecab again”. I haven’t gotten to this yet, because it doesn’t happen too often in a single book.

To fix the first problem, I start by trying to lookup an entire clause as a single word, then trim off the last morpheme and try again, repeating until I either get a match or run out of morphemes. I had built up an elaborate and mostly-successful set of heuristics to prevent false positives, but they had the side effect of also eliminating many perfectly good compounds and expressions. And JMdict actually has quite a few lengthy expressions and set phrases, so while half-assed heuristics were worthwhile, making them better would pay off quickly.

Today, while working on something else entirely, I realized that there was a simple way (conceptually simple, that is; implementation took a few tries) to eliminate a lot of false positives and a lot of the heuristics: pass the search results back through Mecab and see if it produces the same number of morphemes with the same parts of speech.

So, given a string of morphemes like: (い, て, も, 立っ, て, も, い, られ, ない, ほど, に, なっ, て, いる), on the sixth pass I lookup いても立ってもいられない and find a match (居ても立っても居られない, “unable to contain oneself”) that breaks down into the same number of morphemes of the same type. There’s still a chance it could have chosen a wrong word somewhere (and, in fact, it did; the initial い was parsed as 行く rather than 居る, so a stricter comparison would have failed), but as a heuristic, it works much better than everything else I’ve tried, and has found some pretty impressive matches:

  • 口が裂けても言えない (I) won't say anything no matter what
  • たまったものではない intolerable; unbearable
  • 痛くもかゆくもない of no concern at all; no skin off my nose
  • 火を見るより明らか perfectly obvious; plain as daylight
  • 言わんこっちゃない I told you so
  • どちらかと言えば if pushed I'd say
  • と言えなくもない (one) could even say that
  • 似ても似つかない not bearing the slightest resemblance
  • 痛い目に遭わせる to make (a person) pay for (something)
  • と言って聞かない insisting
  • 聞き捨てならない can't be allowed to pass (without comment)

I’ve updated the samples I created a few weeks ago to use the new parsing. Even on that short story, it makes a few sections clearer.

[Update: one of the rare false positives from this approach: 仲には and 中には break down the same, and since the second one is in JMdict as an expression, doing a kana-only lookup on なかには will falsely apply the meaning “among (them)” to 仲には. Because of variations in orthography, I have to allow for kana-only lookups, especially for expressions, but fortunately this sort of false match is rare and easy to catch while reading.]

“Need a clue, take a clue,
 got a clue, leave a clue”