“A table alphabeticall, conteyning and teaching the true writing, and vnderſtanding of hard uſuall Engliſh words, borrowed from the Hebrew, Greeke, Latine, or French, &c. With the interpretation thereof by plaine Engliſh words, gathered for the benefit and helpe of ladies, gentlewomen, or any other vnskilfull persons. Whereby they may the more eaſily and better vnderſtand many hard Engliſh words, vvhich they ſhall heare or read in Scriptures, Sermons, or elſe vvhere, and alſo be made able to vſe the same aptly themſelues.”

— Full title of the first English dictionary

DanMachi 2.4


In which the first arc is quickly wrapped up, thankfully getting rid of the mustache-twirling ugly people who were spitting into the camera. Nice cameo by Mord, who’s clearly learned his lesson (he’s practically tsundere in the DanMemo game…). Pity that Lili’s self-esteem is so firmly tied up in Bell, but maybe Finn can help out with that (misleading non-spoiler).

Next up: Amazons, ho! No, wait, I meant Amazon hoes!

3D Cheesecake: De Dupes


Inevitably, there are duplicate images in my cheesecake archives. Sometimes it’s the exact same file with a different name, which I can detect with a simple MD5 checksum, but often they’re different sizes, or some site has added a watermark, or a magazine overlayed it with text, or someone cropped off the text that someone else added, etc, etc.

Enter PDQ, an image-similarity hashing system that works pretty darn well. Despite coming from the evil facebook empire (usable for detecting kiddie-pr0n and wrongthink memes), the code is pretty decent, compiles cleanly, and only blows up if you feed it a file that doesn’t contain a single image convertable with ImageMagick (pro tip: do not run it on a directory that contains a video file; your swapfile will thank me). A quick review of the images it clustered together confirmed that fully 11% of my images were duplicates.

So what better for a cheesecake theme than images I liked so much I managed to download them at least four times? (not counting any copies I’ve already posted and deleted from the archive, of course; I’ll have to go through my S3 backups sometime to find those)

The following de-duplication recipe uses Miller to process the output; I’d somehow overlooked this tool for years, and I can think of at least one project at work that I wouldn’t be stuck maintaining any more if it were a directory full of mlr recipes instead of Perl modules.

# gather up all your image files
#
find . -type f -name '[0-9a-zA-Z]*.[pjPJ]*' | sort > /tmp/images

# edit the list to remove anything that's not an image (text, video,
# etc); also sanity-check for annoying file names (containing things
# like commas(!), whitespace, quotes, parentheses, etc)

# generate the hashes; this is the tedious part
# (~13/sec on my 12-inch MacBook with images stored on an external SSD)
#
pdq-photo-hasher -d -i < /tmp/images > /tmp/hashes

# cluster similar images, then strip out all images with
# cluster-size=1 (unique)
#
clusterize256 /tmp/hashes | mlr filter '$clusz > 1' > /tmp/alldupes

# extract their filenames
#
mlr --onidx cut -f filename /tmp/alldupes > /tmp/files

# create file containing (filename, height, size) for all images
#
xargs identify -format 'filename=%i,height=%h,size=%B\n' \
    < /tmp/files > /tmp/meta

# join it to the original, for consolidated output
#
mlr join -j filename -f /tmp/meta /tmp/alldupes > /tmp/alldupes2

# for each cluster, keep the file with the largest (height, size)
#
mlr sort -nr height,size then \
    head -n 1 -g clidx then \
    sort -n clidx then \
    cut -f filename /tmp/alldupes2 > /tmp/keep

# create the complementary set of images to delete
#
fgrep -v -f /tmp/keep /tmp/alldupes2 |
    mlr --onidx cut -f filename > /tmp/nuke

# move the dupes to another directory
# (rather than deleting them immediately...)
#
mkdir -p DUPES
mv $(</tmp/nuke) DUPES

When you add additional images to your collection, you can generate their hashes and compare them to the existing data (amusingly, you have to use the tool backwards…):

# hash the new images
#
pdq-photo-hasher [0-9a-zA-Z]*.[pjPJ]* > /tmp/newstuff

# print the filenames of new dupes
# (note that mih-query is a bit twitchy about formatting; the
# hash field must be first, and non-pdq fields need to be at
# the end)
#
mih-query /tmp/hashes /tmp/newstuff | grep match= | mlr --onidx cut -f 4

# add remaining hashes to your DB of unique images

Bonus for correctly guessing which image I had eight copies of. 😁

more...

Super Miniskirt Space Pirates 1: Pirate Officer Cadets


So, the original light novels that Bodacious Space Pirates was based on moved to a new publisher, who redid the covers in a more anime style and has now spun off a new series, apparently the adventures of Marika and Chiaki as Imperial space cadets:

Dear Engadget headline writer...


The incorrect interpretation is surprisingly plausible:

Facebook releases tools to flag harmful content on GitHub

(screenshot after the jump, so you don’t have to look at Zuck)

more...

Thanko: For all your Urban Ninja needs...


Walking the fine line between practical and batshit crazy, it’s the USB-Powered Fleece Helmet. I mean, who doesn’t want to keep their heads warm during the bitter Japanese winters? Except maybe someone who’d rather not have cheap Chinese lithium batteries right under their ears, exposed to the elements and about an inch away from a heating coil.

Um, Hestia?


I don’t think he’s ready for that discussion just yet…

(via the DanMemo mobile game)

Dear Amazon,


No idea how this ties into “Cycling”. Unless she’s a personal trainer.

Also, what have I purchased that made you think a “Cycling” category would be a good set of recommendations?

Then again…

Dear Funimation,


Abandoning the streaming rights to the Japanese version of a recent show while keeping the dub version online is not interesting to me. I just removed half a dozen recent shows from my watchlist because of that. I can still watch them on Crunchyroll, but your FireTV client actually works. If only you two were partners or something; oh, wait.

“Need a clue, take a clue,
 got a clue, leave a clue”