I’m pretty sure “futanari” is not Dutch. Also “gmail”, “iphone”, “http”, “cialis”, and “jackalope”. “bewerkstelligen”, on the other hand, fits right in.
For my new random word generator, I’ve been supplementing and replacing the small language samples from Chris Pound’s site. The old ones do a pretty good job, but the new generator has built-in caching of the parsed source files, so it’s possible to use much larger samples, which gives a broader range of language-flavored words. 5,000 distinct words seems to be the perfect size for most languages.
Project Gutenberg has material in a few non-English languages, and it’s easy to grab an entire chapter of a book. Early Indo-European Online has some terrific samples, most of them easily extracted. But what looked like a gold mine was Deltacorpus: 107 different languages, all extracted with the same software and tagged for part-of-speech. And the range of languages is terrific: Korean, Yiddish, Serbian, Afrikaans, Frisian, Low Saxon, Swedish, Catalan, Haitian Creole, Irish, Kurdish, Nepali, Uzbek, Mongol, etc, each with around 900,000 terms. The PoS-tagging even made it easy to strip out things that were not native words, and generate a decent-sized random subset.
Then I tried them out in the generator, and started to see anomolies: “jpg” is not generally found in a natural language, getting a plausible Japanese name out of a Finnish data set is highly unlikely, etc. There were a number of oddballs like this, even in languages that I had to run through a romanizer, like Korean and Yiddish.
So I opened up the corpus files and started searching through them, and found a lot of things like this:
437 바로가기 PROPN
438 = PUNCT
439 http VERB
440 : PUNCT
441 / PUNCT
442 / PUNCT
443 www NOUN
444 . PUNCT
445 shoop NOUN
446 . PUNCT
447 co NOUN
448 . PUNCT
449 kr INTJ
450 / PUNCT
451 shop PROPN
452 / PUNCT
453 goods NOUN
454 / PUNCT
455 goods_list NOUN
456 . PUNCT
457 php NOUN
458 ? DET
459 category NOUN
460 = PUNCT
461 001014 NUM
1 우리의 ADP
2 예제에서 NOUN
3 content X
4 div에 NOUN
5 float VERB
6 : PUNCT
7 left VERB
8 ; PUNCT
Their corpus-extraction script was treating HTML as plain text, and the pages they chose to scan included gaming forums and technology review sites. Eventually I might knock together a script to decruft the original sources, but for now I’m just excluding the obvious ones and skimming through the output looking for words that don’t belong. This is generally pretty easy, because most of them are obvious in a sorted list:
binnentreedt
biologisch
biologische
bioscoop
bir
bis
bit
bitmap
bitset
blaas
blaasmuziek
blaast
Missing some of them isn’t a big problem, because the generator uses weighted-random selection for each token, and if a start token only appears once, it won’t be selected often, and there are few possible transitions. Still worth cleaning up, since they become more likely when you mix multiple language sources together.
How to tell that your new random-word generator works: feed it the text of the Jargon File and get back things that you could easily come up with real jargon definitions for…
I decided to see how much detail I could get with the 20-degree v-bit. This was carved out of the end grain of a generic hardwood 1-inch square dowel.
Update: I shined a flashlight onto the seal to get a better look at it.
This is a 2-inch square seal created in Illustrator with my script, imported to VCarve Desktop, and cut from a mounted linoleum block on my Nomad with a 20-degree v-bit. The hardest part about getting a good firm impression on paper is that my stamp pad is too darn small; I’ll have to buy an uninked pad at an office supply shop and load it up with the good pigment.
Also making a good impression is glamour model Meru Tsujimura, who cannot be found at office supply shops…
I walked into TAP Plastics today. The sign on the door said:
"Scrap sale, 75% off"
I gained 15 pounds of HDPE, for $8. I think I’ll go back tomorrow for a load of acrylic.
[Update: another $20, another 38 pounds of plastic (~21 HDPE, ~17 cast acrylic); I may go back again if I can find someone working there who knows what some of the mystery plastics in the scrap bin are…]
So, someone got me one of these for my birthday, and I finally got around to building a stand for the outdoor sensor (basically an oversized tameshigiri stand made with pressure-treated lumber).
A promised feature was remote access to your data through an iOS/Android app, so I downloaded the app …and it didn’t work. Surprise! You have to:
I have two words for that, and the first one starts with “F”.
So, delete the Mac app, delete the iPhone app, unplug the cable, and now I have a standalone weather station. Oh, well, at least it wasn’t an Internet of Things Thing that sat on my wireless and took orders from The Cloud.
Update: now that the sun is down, I can report that the backlighting on the screen is bright. I won’t be needing a nightlight downstairs any more.
Not a euphemism!
There are web sites where you can figure out the origin of your pipes, such as Logos and Markings and Pipedia, and both link to other sites where aficionados of a particular brand have gone into obsessive detail. Even that’s not enough, though, and sometimes you don’t even have a name to google. In the case of one of Dad’s pipes, just a stem mark that looked sort-of like a fish-hook.
A sharp-eyed youngster (under 40) at the local pipe shop thought it might have originally been a nearly-vertical “♂” symbol where most of the white paint filling the stamping had worn off, and careful inspection confirmed it. But searching for “male symbol pipe” is a bad idea, even with SafeSearch turned on, and I gave up after a while.
Until a few days ago, when I happened to be looking at the history of Laxey Pipe Ltd, which specialized in making African Meerschaum pipes for other companies. If you’ve seen a Peterson or Barling meer, it’s probably a Laxey, and they were located on the Isle of Man.
Sure enough, one of the sample contract pipes at the bottom of the page is Comoy’s Man Pipe, with the same distinctive bowl carving and ♂ logo as Dad’s:
Amazon Japan’s Halloween Store has a full range of theme costumes, including the always-popular Sexy Nun.
Miko costumes come in Straight, Sexy, Sexy Anime, Sexy Maid, and Touhou (Sexy). Lots of zombies and schoolgirls, too, of course. Not that these are strictly Halloween costumes.