If I run Flux.2-Klein-9b at the recommended settings (CFG 1, 8 steps, 1024x1024-ish resolutions), it takes about 6 seconds to generate an image on my RTX 4090. This is fast enough to tinker with a dynamic prompt, run off a few hundred results, quickly reject the (9% at 8 steps) anatomy fails, and then pick out some that look pretty good. It’s a better use of my gaming PC right now than killing time grinding in Diablo IV or hunting for something new to play.
But since I already have hundreds of GenAI SF cover gals lying around waiting to be deathmatched, today we’re going to look at what happens when I really lean into letting LLMs enhance prompts.
I made the changes to my LLM-prompt-enhancing script to run multiple system prompts across the same string in order rather than invoking it multiple times in a pipeline, and it improved the stability, but it looks like the occasional crash is actually caused by a recent update to the engine under the hood (llama.cpp), so I still have to occasionally restart the script, whether it’s talking to the PC or the Mac Mini. Even on the gaming PC, it takes about as long to do a complex prompt enhancement as it does to generate the resulting image, so I just let them both run while I did other things, and occasionally kicked off a new batch.
Perhaps I gave it a bit too much freedom…
(more after the jump)
For a change of pace, I abandoned my wildcard sets and just fed the LLM brief descriptions. The base prompt was simple enough:
A mid-century catalog illustration featuring a @<makeover:pretty young woman>@ wearing @<fashion: sexy lingerie from the 1950s>@, serving cocktails outdoors in the back yard of a 1950s suburban home. The image is composed to emphasize the setting as much as the woman.
There are a total of 4 LLM invocations: the two targeted ones listed above, the standard enhancement prompt recommended by Z-Image Turbo, and a cleanup pass I’ve named “legal review” that adjusts ages to cut down on random lolis.
(more after the jump)
Bumping the resolution 25% and adding 4 refining steps increased the generation time to a whopping 9.5 seconds, so after I’d made a bunch of those, I made a slight change to the theme.
A mid-century Japanese catalog illustration featuring a @<makeover:pretty young Japanese woman>@ wearing @<fashion: sexy lingerie from the 1950s>@, serving cocktails outdoors under a blossoming Japanese cherry tree in the Spring. The image is composed to emphasize the setting as much as the woman.
Primary flaw in this set: they’re all at least 7 feet tall. Either that or they live in Munchkinville.
Markdown formatting and simple HTML accepted.
Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.