GenAI Girl Trouble


This week’s SwarmUI Discord theme is food. Z Image Turbo nailed the concept on the first try, but I made a few more until I had one where she was carrying the pepperoni in both hands.

(the first time I tried to upscale this, I got a bunch of extra extra-tiny people scattered around the image. Turned out that the “refiner do tiling” option was on (legacy of a more memory-intensive model), and it was generating additional people on each tile, at the appropriate scale; first time I’ve seen it do something like that)

Captions, on the other hand…

The “New Yorker” look was easy, given the artist name and a style description, but getting every word right remains a challenge for diffusion models, even when you avoid rare words. Part of the problem is that for all its virtues, the turbo part of Z Image Turbo means that it does 90% of the work in the first pass, which doesn’t leave as much room for variation and refinement. It took 10 tries to get 2 correct captions, one of which was rendered even smaller than this one.

The WWWA Central Computer says “get off my lawn!”

Dirty Pair 40th anniversary merchandise.


Comments via Isso

Markdown formatting and simple HTML accepted.

Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.