“The arts community is generally dominated by liberals because if you are concerned mainly with painting or sculpture, you don’t have time to study how the world works. And if you have no understanding of economics, strategy, history and politics, then naturally you would be a liberal.”

—  Mark Helprin, 8/10/2005

A clean sweep?


Over the past few weeks, a number of Doctor Who fan sites have claimed that nobody will be continuing to the next series. It’s not just Moffat and Capaldi; Missy and Bill’s actors are apparently departing as well. Nothing specific about Nardole, that I can find, but it’s generally being assumed that he’s also out.

If true, I won’t lose any sleep over Missy, but despite them not giving her much to do, Bill has been an interesting character. I’d be sorry to see her go before Pearl Mackie gets a chance to develop the character. Not my favorite companion (that would be Wilfred Mott), but she hasn’t really been given the chance.

I’m quite optimistic about incoming showrunner Chris Chibnall, though. The man responsible for the first two seasons of Torchwood and half a dozen Tennant/Smith episodes has a good grasp of the universe.

"Someday, you could just walk past a fez."


The DoctorNana Asakawa: “Never gonna happen.”

more...

The Cutest Little Sister In The World


No, wait, that’s the book Our Hero is writing in Eromanga-Sensei. The porn film series debut these photos come from is actually called “My Little Sister’s Lovely Boobs Keep Popping Out”.

Which, come to think of it, sounds exactly like a late-night anime title. Maybe it comes out next season.

Anyway, Miharu Usa demonstrates her qualifications for the lead role after the jump.

more...

Who could resist?


(via)

Dear Amazon,


Well, some of these might be more stimulating than the Godzilla movies you recommended for a previous “night in”

more...

Level 6, Challenge, Doctor Ex Machina


DanSora 6

Newsflash: Aiz displayed a new emotion. Okay, it’s a pout, but that’s more than she’d shown in the previous series. Surprisingly, leveling up only took 1/3 of the episode, leaving time for her to pet Bell, pout over his flight-or-flight instincts, then rescue him from prum treachery. And have Loki and her senior staff discuss The Plot.

Eromanga-sensei 7

In which Fierce Rival Muramasa displays all of the emotion that she lacks as Aiz Wallenstein. Seriously, Saori Ōnishi must feel like she’s coming back to life when she switches from recording the wooden princess to this role. On a related note, am I the only one wishing for more editor-san?

Advice for Our Hero: dude, accept the confessions and go for the Type One Tenchi solution.

Doctor Who 10.6

This episode felt a lot longer than the others. Maybe it’s because it wasn’t written by interns, and actually goes somewhere. Maybe because Moffat realized we all knew who was inside the box. Maybe because Nardole got an opportunity to do more than nag. The villain still feels derivative, but at least there was some variety in the sets.

Jun Amaki


Jun Amaki falls somewhere in the middle of the current pack of gravure models: age 21, nominal singing career, pretty enough that stylists and photographers don’t forget to include her face, but only 1 photobook and 3 DVDs so far in her four-year career. That last is a bit surprising, since she’s also 4'10" with a 95-I bustline, and falls firmly into the “loli-cute” category (referring to the face only, of course…).

And she can be quite expressive on camera:

She is featured in a group photobook coming out in a few weeks, titled If My Cat Turned Into A Cute Girl. I wonder what breed she’ll be…

more...

Corpus Fun


I’m pretty sure “futanari” is not Dutch. Also “gmail”, “iphone”, “http”, “cialis”, and “jackalope”. “bewerkstelligen”, on the other hand, fits right in.

For my new random word generator, I’ve been supplementing and replacing the small language samples from Chris Pound’s site. The old ones do a pretty good job, but the new generator has built-in caching of the parsed source files, so it’s possible to use much larger samples, which gives a broader range of language-flavored words. 5,000 distinct words seems to be the perfect size for most languages.

Project Gutenberg has material in a few non-English languages, and it’s easy to grab an entire chapter of a book. Early Indo-European Online has some terrific samples, most of them easily extracted. But what looked like a gold mine was Deltacorpus: 107 different languages, all extracted with the same software and tagged for part-of-speech. And the range of languages is terrific: Korean, Yiddish, Serbian, Afrikaans, Frisian, Low Saxon, Swedish, Catalan, Haitian Creole, Irish, Kurdish, Nepali, Uzbek, Mongol, etc, each with around 900,000 terms. The PoS-tagging even made it easy to strip out things that were not native words, and generate a decent-sized random subset.

Then I tried them out in the generator, and started to see anomolies: “jpg” is not generally found in a natural language, getting a plausible Japanese name out of a Finnish data set is highly unlikely, etc. There were a number of oddballs like this, even in languages that I had to run through a romanizer, like Korean and Yiddish.

So I opened up the corpus files and started searching through them, and found a lot of things like this:

437 바로가기    PROPN   
438 =   PUNCT   
439 http    VERB    
440 :   PUNCT   
441 /   PUNCT   
442 /   PUNCT   
443 www NOUN    
444 .   PUNCT   
445 shoop   NOUN    
446 .   PUNCT   
447 co  NOUN    
448 .   PUNCT   
449 kr  INTJ    
450 /   PUNCT   
451 shop    PROPN   
452 /   PUNCT   
453 goods   NOUN    
454 /   PUNCT   
455 goods_list  NOUN    
456 .   PUNCT   
457 php NOUN    
458 ?   DET 
459 category    NOUN    
460 =   PUNCT   
461 001014  NUM 

1   우리의  ADP 
2   예제에서    NOUN    
3   content X   
4   div에   NOUN    
5   float   VERB    
6   :   PUNCT   
7   left    VERB    
8   ;   PUNCT   

Their corpus-extraction script was treating HTML as plain text, and the pages they chose to scan included gaming forums and technology review sites. Eventually I might knock together a script to decruft the original sources, but for now I’m just excluding the obvious ones and skimming through the output looking for words that don’t belong. This is generally pretty easy, because most of them are obvious in a sorted list:

binnentreedt
biologisch
biologische
bioscoop
bir
bis
bit
bitmap
bitset
blaas
blaasmuziek
blaast

Missing some of them isn’t a big problem, because the generator uses weighted-random selection for each token, and if a start token only appears once, it won’t be selected often, and there are few possible transitions. Still worth cleaning up, since they become more likely when you mix multiple language sources together.

“Need a clue, take a clue,
 got a clue, leave a clue”