I liked the L’il Esils so much that I decided to make more, adding the challenge of putting them to bed without the model turning it into loli porn. Careful use of negative prompts managed to produce usually innocent pictures of L’il Esil with innocent sleepwear, stuffed animals, and non-sexy poses, but the attempt exposed another common problem with generative AI: it doesn’t understand how rooms work or how people fit inside them.
The basic prompt was: “<lora-trigger> wearing black and red frilly pajamas and fuzzy slippers. sfw, happy, flat chest, child height, child body, child face. indoors, girl’s bedroom at night, evil stuffed animals in background” (and, yes, despite the lack of nudity and sex keywords, I had to add “cum” to the negative prompt; in one pic, the teddy bear had a puddle of white stuff leaking from its crotch, sigh).
A human being would interpret “child height” in relationship to the typical sizes of common bedroom objects, like doors, chairs, dressers, and beds. Generative AI sometimes seems to get it right, but it’s really the user deciding between equally-likely alternatives.
So for fun, here’s a whole bunch of L’il Esils, and the challenge is to notice how many things are not-quite-right about the images.
Just for fun, I decided to make some L’il Esils. This is mostly a matter of adding “child height, child face, child body, flat chest” to the prompt, but the way people train models, you’re likely to find yourself rapidly filling the negative prompt with things like “nsfw, nude, naked, topless, nipples, crotch, upskirt, sideboob”, etc; if they can go NSFW, they will, often in ways that you don’t want stored on your computer. Even mentioning skin color can be enough to trigger nudity.
A clear plastic bag is not a shipping container for a breakable item. TL/DR: it broke.
I asked ChatGPT to write a Bash script that toggles the blocking status of a Pihole, using the REST API. It wrote (broken) code that used the API that was valid from version 3.x through version 5.x. However, a completely different API was introduced for version 6.x, and there’s no backwards compatibility. I didn’t know this. ChatGPT didn’t know this. Even pointing ChatGPT at the updated API page, which it dutifully pretended to retrieve and analyze, did not change its answer.
So I asked again, this time specifying “v6 api”. It immediately rewrote the script to use the correct authentication method and API calls, but it didn’t work. After editing the script to dump debug data, I informed it that the call to get the current blocking status returned “enabled” or “disabled”, but the call to set the status required true or false, while still returning enabled/disabled. This is kinda dumb, but pretty typical of cheesy REST APIs.
It took several more passes to produce working code, but ChatGPT was absolutely confident that it was Right every time, insisting that the latest-but-still-broken version was “corrected and fully tested”. Despite having no ability to, y’know, test.
All this for a single page of code that does something that’s completely fucking trivial. Tell me again how this horseshit is replacing 90% of Real Software Engineers at Big Tech companies. And then pull the other one, it’s got bells on.
On the other hand, there’s a special level of Hell reserved for devs who write update code that fails to check if the new version will still be supported on the current OS before breaking the existing install. Spoiler: it wasn’t.
So now I have a new Pihole running as a Docker container on my Synology NAS, and a Raspberry Pi to reinstall from scratch, with whatever the current release of the GPS-hat NTP stratum 0 package is.
I was dinking around with dynamic prompts for Stable Diffusion again, and at one point ended up getting a robotic catgirl. This seemed like a fine idea, so I stripped it down to the basics and tried to build back up into a specific look. I was getting reasonably consistent results with one model, but it was weak on backgrounds. TL/DR, I ended up running the same prompt and seed through 64 models because I wasn’t able to get all the elements at the same time: clearly robotic body, metal skin and face, metal cat ears and tail, pretty face, sexy figure, naughty bits covered, interesting background, requested pose.
So I ran the exact same prompt and seed through 64 models:
4k, crisp, high budget, highly detailed, intricate, ultra textured.
photorealistic, hyper realistic, ((robot girl)) at ((Towering spires
of glass piercing a sea of swirling mist)). ((robot face)), ((silver
metal skin)), ((silver metal face)), ((metal hair)), tiny breasts,
silver metal cat tail, black metal cat ears, full body, teen height,
teen face, teen body, wearing ((black metal corset bodysuit)), ((black
metal stockings)), ((black metal long gloves)), ((black metal collar))
with ((glowing crystal bell)). Cross-legged on floor, hands on knees,
upright posture, calm and collected with a cheerful expression.
I tried to bias it toward a slender look with “teen face/body/height” and “tiny breasts” (which generally produces B-C instead of D-H). I moved the location to the beginning because I kept getting just floors and solid-color backgrounds with the occasional generic room interior. Repeating the colors for each item helps to keep them from bleeding into other elements, but, as usual, doesn’t always work.
Definitely a catgirl, mostly-correct costuming, matte silver skin except for neck and eyes, plausibly robotic arms and legs, terrible tail (fur, broken), robot cat-ears, cute face, sleek little body, appealing pose but doesn’t match the prompt, nice background that matches prompt.
How does it feel to have a President again after four years of nameless unaccountable staffers taking turns shoving their hands up Joe Biden’s puppet-hole?
And this.
Perhaps some of this.
Definitely some of this.
And of course there’s plenty of this.
“…cross country data and six additional studies find that people with lower AI literacy are typically more receptive to AI.” (cite)
…and they vote, too!
Much excitement is being generated by the MIT-licensed release of the Chinese-made Deepseek models. Let’s see how they do…
TL/DR: the results are terrible, but the detailed “reasoning process” is fucking hilarious. Reminder, this is supposed to be the good stuff, the first time pro-grade AI models have been released for offline use.
Up to this point, I’ve been more-or-less taking the advice of model creators and uploaded pictures on CivitAI when it comes to choosing the sampler and scheduler settings for Stable Diffusion models, but this produced problems when I tried to compare the same prompt and parameters across a large group of models, to see how they handled details like faces, finger counts, lighting, depth of field, and of course, “paying attention to the prompt”.
I was going to do a detailed comparison of the 13x31 grid of pictures I got from testing identical settings with all of the available schedulers and samplers, but as I worked my way through the results, I learned an important lesson: don’t choose a reference pic where the gal’s legs are crossed and her fingers are interlaced. This is pretty much the worst-case scenario for evaluating SD images of human beings…
TL/DR: over a third of the combinations produced garbage, and about half of the rest looked very similar in the foreground with some minor out-of-focus differences in the background, but there were quite a few small differences in her clothing’s shape, color, coverage, and material. Face and hair were pretty similar, with only a few looking like a completely different girl, and maybe a quarter having the hair parted on the other side. A fair number changed the pose in some way, although there were maybe six different poses total out of 403 images.
Next time, I’ll set the test up more carefully, so I can actually draw some conclusions beyond, “yeah, just don’t bother with most of the samplers and schedulers”. 😁
Amazon’s “AI” comment-summarizer says this:
Customers find the story engaging and action-packed. They describe the book as a fun, intense read that is worth reading. The series is considered good to great by customers. Readers appreciate the complex characters and the author’s writing style. The pacing is described as fast and consistent. Overall, customers praise the author’s writing quality and consider it an excellent military adventure.
Human summarizer says, “OH JOHN RINGO NO!”. 😁
How to get Flux.1-Dev
to stab an orc: “…bleeding from a large chest
wound. A sword grows vertically from the wound.” The official release
seems a bit vague on what an “orc” looks like, but with some extra
prompting will do the right thing:
side view, at night. photograph of a male ((orc)) warrior with green skin, pointed ears, and tusks, wearing armor, ((lying on back)) on a battlefield with his eyes closed, bleeding from a large chest wound. A sword grows vertically from the wound.
(new anime? not until Saturday!)
The relative ease of customizing Stable Diffusion models means that thousands of people are stirring the pot and training their own. This is good, since the official models are biased and censored, but it’s also bad, because the derivative models are biased in different directions, and often over-trained to the point that they simply snap when you find their edges.
Most people don’t do their custom training against the base SD models; they layer their collection of picture/keyword pairs on top of one that’s already been “uncensored” or augmented in some way, with the two major anime branches being Illustrious and Pony. What this means in practice is that feeding the same settings to related models will often produce very similar results.
So, just how similar do they get?
I’ve been using SwarmUI’s grid feature to evaluate different models by passing them all the same prompt, seed, and settings.
For each set, I used a character LoRA (small patch model that can be used to add character/style/location data onto other models with varying success depending on heredity), and generated multiple pictures in my go-to model for cute-and-occasionally-naughty material, CAT - Citron Anime Treasures (Illustrious-based), until I found something that looked like a decent starting point:
Setting aside the boilerplate and the character trigger words, the prompt was:
laughing, standing with arms spread, head back, grounded stance, freedom in motion, outdoors, at Santorini, Greece
Once you’ve asked chatbots for information a few times, you start to spot patterns. Here’s a perfect example: I asked Google about putting vanilla extract in tiramisu (something traditional recipes don’t do). High on the list of results was Spoonable Recipes, and every line screams generative AI:
Mascarpone cheese is the most popular ingredient in tiramisu dishes. In fact, over 80% of tiramisu recipes contain mascarpone cheese.
Softened is a frequent preparation for mascarpone cheese in tiramisu dishes.
Mascarpone cheese is often included in tiramisu dishes in amounts of 8 ounces, 1 pound or 1 cup.
Another popular ingredient in tiramisu is white sugar. From the recipes we’ve sampled for tiramisu, over 70% have white sugar.
Tiramisu dishes often call for white sugar to be granulated.
White sugar is often included in tiramisu dishes in amounts of 1 tablespoon, three quarters of a cup or a quarter cup.
…
Another popular ingredient in tiramisu is vanilla extract. From the recipes we’ve sampled for tiramisu, over 40% have vanilla extract.
Vanilla extract is often included in tiramisu dishes in amounts of 1 teaspoon or half a teaspoon.
In tiramisu recipes that contain vanilla extract, it is on average, 0.7% by weight.
In recipes for tiramisu, vanilla extract is often used with mascarpone cheese, ladyfinger cookies, confectioners sugar, chocolate and white sugar.
Potential substitutions for vanilla extract in tiramisu:
pumpkin pie spiceAlso, vanilla extract is not often used with flour, white chocolate, pumpkin and lemon curd.
And remember, that cream won’t whip itself!
(as investor-hungry “AI” companies frantically scramble for fresh content to build their next-generation engines with, they’re hoovering up previous-generation output like this “recipe analysis” and spreading the contamination. A lot of people doing text-to-image generation rave about Flux over Stable Diffusion XL, but the first time I tried it, I got even more fingers per hand; one poor gal must have had a dozen, and that’s enough ladyfingers for three full servings of tiramisu!)
… and a happy Kiwi!
Or, in the words of Henry The Red, “Thank you, generous hosts!”
[boilerplate]. A pin-up photo of a pretty teen girl, [female-hairstyle], [sexy-pose], with a [positive-mood] expression, wearing from 2-5 [lingerie], at [famous-place]. [framing-light], [camera]
Negative: pregnant, frame, cropped, [negatives]
4k, breathtaking, crisp, gorgeous, high budget, highly detailed, intricate, professional, ultra textured. A pin-up photo of a pretty teen girl, Soft chignon, neatly tucked at the nape of the neck, Lying on side, elbow bent, hand supporting head, elegant silhouette formed, with a grateful expression, wearing compression shorts, bow-front bikini and camisole set, lace-up bodysuit, striped panties, at Scottish Highlands, Scotland. Harsh midday lighting, wide angle, scattered elements, vibrant contrast, dynamic shadows., Eye-level angle (from behind) view medium Close shot (focus on feet).
Negative: pregnant, frame, cropped. bad anatomy, bad proportions, banner, censored, collage, cropped, deformed, disconnected limbs, disfigured, duplicate, error, extra arms, extra digits, extra hands, extra limbs, fused fingers, grainy, gross proportions, logo, long neck, low contrast, low quality, low resolution, malformed limbs, missing arms, missing fingers, multiple panel, mutated, mutated hands, mutated limbs, out of focus, oversaturated, poorly drawn eyes, poorly drawn face, poorly drawn hands, signature, split frame, split screen, text, ugly, ugly, unreal, username, watermark, worst quality.
“This doesn’t look like the Lincoln Tunnel, Sam.”
Also not a close-up foot shot in harsh midday lighting in the Scottish Highlands. And I didn’t even ask for an army; they must have been assembled from “silhouette”, “scattered elements” and “dynamic shadows”.
You are the pilot of a ship capable of traveling the multiverse. The cockpit contains thousands of unlabeled buttons, switches, dials, and sliders. Think Tardis, but taken to 11.
You may adjust any number of controls before hitting The Big Red Button, and then you will be transported to a completely different universe, where anything can happen.
With me so far? Good.
Order matters. As you adjust each control, your destination in the multiverse shifts, so that each additional control you adjust applies its effect from a different starting point. Turn a dial too far in one direction, and your destination could be so far from home that slugs are the dominant species on Earth, and you can’t get back by pressing three green buttons and toggling a switch.
Still here? Awesome.
Here comes the real fun: each control on your board is actually wired up to ten thousand completely independent engines, all of which impact your destination in some way, big or small. In the first engine, that dial setting puts slugs in charge, but in engine #751, it puts giant breasts on cats. Not catgirls, cats.
By your side is a quirky robot with tentacles. Its job is to convert your spoken orders into control adjustments, but it doesn’t fully understand human language, has a dim grasp of each control’s effect, and guesses to resolve ambiguity. Because it has received contradictory orders from its makers, at random intervals it goes insane and assaults you with the tentacles; it never remembers these episodes.
(I let DALL-E handle this one, because I’m busy cleaning, baking, and wrapping presents for Christmas dinner tomorrow)
After the jump, a not-so-Christmas NSFW Miracle!