I was dinking around with dynamic prompts for Stable Diffusion again, and at one point ended up getting a robotic catgirl. This seemed like a fine idea, so I stripped it down to the basics and tried to build back up into a specific look. I was getting reasonably consistent results with one model, but it was weak on backgrounds. TL/DR, I ended up running the same prompt and seed through 64 models because I wasn’t able to get all the elements at the same time: clearly robotic body, metal skin and face, metal cat ears and tail, pretty face, sexy figure, naughty bits covered, interesting background, requested pose.
So I ran the exact same prompt and seed through 64 models:
4k, crisp, high budget, highly detailed, intricate, ultra textured.
photorealistic, hyper realistic, ((robot girl)) at ((Towering spires
of glass piercing a sea of swirling mist)). ((robot face)), ((silver
metal skin)), ((silver metal face)), ((metal hair)), tiny breasts,
silver metal cat tail, black metal cat ears, full body, teen height,
teen face, teen body, wearing ((black metal corset bodysuit)), ((black
metal stockings)), ((black metal long gloves)), ((black metal collar))
with ((glowing crystal bell)). Cross-legged on floor, hands on knees,
upright posture, calm and collected with a cheerful expression.
I tried to bias it toward a slender look with “teen face/body/height” and “tiny breasts” (which generally produces B-C instead of D-H). I moved the location to the beginning because I kept getting just floors and solid-color backgrounds with the occasional generic room interior. Repeating the colors for each item helps to keep them from bleeding into other elements, but, as usual, doesn’t always work.
Definitely a catgirl, mostly-correct costuming, matte silver skin except for neck and eyes, plausibly robotic arms and legs, terrible tail (fur, broken), robot cat-ears, cute face, sleek little body, appealing pose but doesn’t match the prompt, nice background that matches prompt.
How does it feel to have a President again after four years of nameless unaccountable staffers taking turns shoving their hands up Joe Biden’s puppet-hole?
And this.
Perhaps some of this.
Definitely some of this.
And of course there’s plenty of this.
“…cross country data and six additional studies find that people with lower AI literacy are typically more receptive to AI.” (cite)
…and they vote, too!
Much excitement is being generated by the MIT-licensed release of the Chinese-made Deepseek models. Let’s see how they do…
TL/DR: the results are terrible, but the detailed “reasoning process” is fucking hilarious. Reminder, this is supposed to be the good stuff, the first time pro-grade AI models have been released for offline use.
Up to this point, I’ve been more-or-less taking the advice of model creators and uploaded pictures on CivitAI when it comes to choosing the sampler and scheduler settings for Stable Diffusion models, but this produced problems when I tried to compare the same prompt and parameters across a large group of models, to see how they handled details like faces, finger counts, lighting, depth of field, and of course, “paying attention to the prompt”.
I was going to do a detailed comparison of the 13x31 grid of pictures I got from testing identical settings with all of the available schedulers and samplers, but as I worked my way through the results, I learned an important lesson: don’t choose a reference pic where the gal’s legs are crossed and her fingers are interlaced. This is pretty much the worst-case scenario for evaluating SD images of human beings…
TL/DR: over a third of the combinations produced garbage, and about half of the rest looked very similar in the foreground with some minor out-of-focus differences in the background, but there were quite a few small differences in her clothing’s shape, color, coverage, and material. Face and hair were pretty similar, with only a few looking like a completely different girl, and maybe a quarter having the hair parted on the other side. A fair number changed the pose in some way, although there were maybe six different poses total out of 403 images.
Next time, I’ll set the test up more carefully, so I can actually draw some conclusions beyond, “yeah, just don’t bother with most of the samplers and schedulers”. 😁
Amazon’s “AI” comment-summarizer says this:
Customers find the story engaging and action-packed. They describe the book as a fun, intense read that is worth reading. The series is considered good to great by customers. Readers appreciate the complex characters and the author’s writing style. The pacing is described as fast and consistent. Overall, customers praise the author’s writing quality and consider it an excellent military adventure.
Human summarizer says, “OH JOHN RINGO NO!”. 😁
How to get Flux.1-Dev
to stab an orc: “…bleeding from a large chest
wound. A sword grows vertically from the wound.” The official release
seems a bit vague on what an “orc” looks like, but with some extra
prompting will do the right thing:
side view, at night. photograph of a male ((orc)) warrior with green skin, pointed ears, and tusks, wearing armor, ((lying on back)) on a battlefield with his eyes closed, bleeding from a large chest wound. A sword grows vertically from the wound.
(new anime? not until Saturday!)
The relative ease of customizing Stable Diffusion models means that thousands of people are stirring the pot and training their own. This is good, since the official models are biased and censored, but it’s also bad, because the derivative models are biased in different directions, and often over-trained to the point that they simply snap when you find their edges.
Most people don’t do their custom training against the base SD models; they layer their collection of picture/keyword pairs on top of one that’s already been “uncensored” or augmented in some way, with the two major anime branches being Illustrious and Pony. What this means in practice is that feeding the same settings to related models will often produce very similar results.
So, just how similar do they get?
I’ve been using SwarmUI’s grid feature to evaluate different models by passing them all the same prompt, seed, and settings.
For each set, I used a character LoRa (small patch model that can be used to add character/style/location data onto other models with varying success depending on heredity), and generated multiple pictures in my go-to model for cute-and-occasionally-naughty material, CAT - Citron Anime Treasures (Illustrious-based), until I found something that looked like a decent starting point:
Setting aside the boilerplate and the character trigger words, the prompt was:
laughing, standing with arms spread, head back, grounded stance, freedom in motion, outdoors, at Santorini, Greece
Once you’ve asked chatbots for information a few times, you start to spot patterns. Here’s a perfect example: I asked Google about putting vanilla extract in tiramisu (something traditional recipes don’t do). High on the list of results was Spoonable Recipes, and every line screams generative AI:
Mascarpone cheese is the most popular ingredient in tiramisu dishes. In fact, over 80% of tiramisu recipes contain mascarpone cheese.
Softened is a frequent preparation for mascarpone cheese in tiramisu dishes.
Mascarpone cheese is often included in tiramisu dishes in amounts of 8 ounces, 1 pound or 1 cup.
Another popular ingredient in tiramisu is white sugar. From the recipes we’ve sampled for tiramisu, over 70% have white sugar.
Tiramisu dishes often call for white sugar to be granulated.
White sugar is often included in tiramisu dishes in amounts of 1 tablespoon, three quarters of a cup or a quarter cup.
…
Another popular ingredient in tiramisu is vanilla extract. From the recipes we’ve sampled for tiramisu, over 40% have vanilla extract.
Vanilla extract is often included in tiramisu dishes in amounts of 1 teaspoon or half a teaspoon.
In tiramisu recipes that contain vanilla extract, it is on average, 0.7% by weight.
In recipes for tiramisu, vanilla extract is often used with mascarpone cheese, ladyfinger cookies, confectioners sugar, chocolate and white sugar.
Potential substitutions for vanilla extract in tiramisu:
pumpkin pie spiceAlso, vanilla extract is not often used with flour, white chocolate, pumpkin and lemon curd.
And remember, that cream won’t whip itself!
(as investor-hungry “AI” companies frantically scramble for fresh content to build their next-generation engines with, they’re hoovering up previous-generation output like this “recipe analysis” and spreading the contamination. A lot of people doing text-to-image generation rave about Flux over Stable Diffusion XL, but the first time I tried it, I got even more fingers per hand; one poor gal must have had a dozen, and that’s enough ladyfingers for three full servings of tiramisu!)
… and a happy Kiwi!
Or, in the words of Henry The Red, “Thank you, generous hosts!”
[boilerplate]. A pin-up photo of a pretty teen girl, [female-hairstyle], [sexy-pose], with a [positive-mood] expression, wearing from 2-5 [lingerie], at [famous-place]. [framing-light], [camera]
Negative: pregnant, frame, cropped, [negatives]
4k, breathtaking, crisp, gorgeous, high budget, highly detailed, intricate, professional, ultra textured. A pin-up photo of a pretty teen girl, Soft chignon, neatly tucked at the nape of the neck, Lying on side, elbow bent, hand supporting head, elegant silhouette formed, with a grateful expression, wearing compression shorts, bow-front bikini and camisole set, lace-up bodysuit, striped panties, at Scottish Highlands, Scotland. Harsh midday lighting, wide angle, scattered elements, vibrant contrast, dynamic shadows., Eye-level angle (from behind) view medium Close shot (focus on feet).
Negative: pregnant, frame, cropped. bad anatomy, bad proportions, banner, censored, collage, cropped, deformed, disconnected limbs, disfigured, duplicate, error, extra arms, extra digits, extra hands, extra limbs, fused fingers, grainy, gross proportions, logo, long neck, low contrast, low quality, low resolution, malformed limbs, missing arms, missing fingers, multiple panel, mutated, mutated hands, mutated limbs, out of focus, oversaturated, poorly drawn eyes, poorly drawn face, poorly drawn hands, signature, split frame, split screen, text, ugly, ugly, unreal, username, watermark, worst quality.
“This doesn’t look like the Lincoln Tunnel, Sam.”
Also not a close-up foot shot in harsh midday lighting in the Scottish Highlands. And I didn’t even ask for an army; they must have been assembled from “silhouette”, “scattered elements” and “dynamic shadows”.
You are the pilot of a ship capable of traveling the multiverse. The cockpit contains thousands of unlabeled buttons, switches, dials, and sliders. Think Tardis, but taken to 11.
You may adjust any number of controls before hitting The Big Red Button, and then you will be transported to a completely different universe, where anything can happen.
With me so far? Good.
Order matters. As you adjust each control, your destination in the multiverse shifts, so that each additional control you adjust applies its effect from a different starting point. Turn a dial too far in one direction, and your destination could be so far from home that slugs are the dominant species on Earth, and you can’t get back by pressing three green buttons and toggling a switch.
Still here? Awesome.
Here comes the real fun: each control on your board is actually wired up to ten thousand completely independent engines, all of which impact your destination in some way, big or small. In the first engine, that dial setting puts slugs in charge, but in engine #751, it puts giant breasts on cats. Not catgirls, cats.
By your side is a quirky robot with tentacles. Its job is to convert your spoken orders into control adjustments, but it doesn’t fully understand human language, has a dim grasp of each control’s effect, and guesses to resolve ambiguity. Because it has received contradictory orders from its makers, at random intervals it goes insane and assaults you with the tentacles; it never remembers these episodes.
(I let DALL-E handle this one, because I’m busy cleaning, baking, and wrapping presents for Christmas dinner tomorrow)
After the jump, a not-so-Christmas NSFW Miracle!
Thanksgiving will be at my house… next week?!? Time for some serious housework, at least in the public areas. I could call in a maid service, but I don’t think they have any Mysterious ones.
FYI, Anne of Green Gables has long been quite popular in Japan, one of the few things my friend Dan knew about the country (having spent time dealing with tour groups to Prince Edward in his misspent youth). So it shouldn’t be a surprise that there will be a new anime series next spring. From the brief description, it sounds like it will run for multiple cours.
(not planning to watch it; I’d rather have more Frieren)
The most useful announcement in this streaming roundup is that Amazon is folding Freevee into Prime, which means that I’ll finally be able to watch the newer Bosch seasons without ads. It never made sense to me that there was no way to watch Freevee shows without ads, no matter how much you paid the mothership each month.
Well below the fold is the announcement that Apple is losing a fortune on AppleTV+ and is trying to license their $20 billion in original productions to other services. Which means I might finally watch some of them. Now if only Disney and HBO would do the same…
(the only time I ever see television ads is when I’m in the barber chair, and it feels like Alex’s “treatment” scene in A Clockwork Orange; Amazon often tries to sneak one in when you start watching a show on Prime even if you pay the ad-free premium, but you can skip it)
I mute every xTwitter account that displays an ad on my timeline, except for the truly obnoxious ones, which I block. They don’t provide any real management tools for these lists, so I can’t say how many accounts I’ve muted, but with an average of one ad for every 3 tweets viewed, it’s easily several thousand. Most are for products I’d never buy, sites I’d never visit, and people who wrongly think they have something to say that’s worth paying for, but many just seem like desperate cries for help.
Honestly, the one I came closest to engaging with was for an Etsy dealer advertising handmade bedside emergency condom boxes, and my only interest was in finding out if they were deliberately using the word “discrete” instead of “discreet” to name their product (and it was not a one-off typo).
(coincidentally, around midnight the clouds burst open and my neighborhood was inundated with liberal tears…)
I’ve never wanted turkey on a pizza, but it at least sounds plausible. Green beans and cranberries, however, are way over into hell-no territory. Might as well eat the bugs at that point.