As we wave farewell to Leoparde’s summer sizzler…
…we look forward to the trick-and-treat cast party:
A fairly new model, Omnigen2 is supported by SwarmUI for image generation and editing, so I don’t have to change my workflow to play with it. It has a full LLM embedded, so it’s better at parsing prompts than the usual Stable Diffusion models, but still has a learning curve. Compared to ChatGPT, you’ll spend a lot more time writing a lengthy, detailed prompt that produces the results you want, but won’t get cockblocked by randomly-changing secret policies that refuse to generate your picture at all.
It is definitely capable of producing naughtypics, but is not overtrained to the point that it just goes there at random.
So, how does it handle the same prompt that produced my latest fake isekai book cover?
Okay, there are definitely some issues with text generation. The quality varies depending on sampler/scheduler, but the bottom line is that the major online image-generators obviously have separate text-rendering subsystems that can layout actual strings of text. With Omnigen2, the progress images clearly show it conjuring up letters from noise and tokens. On several occasions I saw it create the word “FLEET” and then change the “L” to an “E” over the course of a few passes. (at some point I need to capture some progress images and convert them to short movies; it really illustrates the whole “stochastic parrot” thing that should reduce your faith in current “AI”)
Generally speaking, the usable CFG range for this model is ~2-8.5, with more texture and detail at low values; with SwarmUI, you can also hold CFG constant and vary “IP2P CFG”. Low step counts didn’t work out for me, and I got better results in the 35-50 range. The “Beta” scheduler is reliable, and the very slow “Heun++ 2” produces some very interesting results. The “DPM++ 2M SDE, GPU Seeded” sampler is very good. At some point I’ll have to generate a full grid to see which combinations work the best for both image and text; given how slow Heun is, it will have to run overnight.
The prompt was the same basic thing I fed to ChatGPT and Grok, plus some tweaks and a whole set of do-not-do negative prompts to reduce the percentage of garbage. For instance, the training data set included a lot of 3D renderings of book covers, and none of them look good.
Prompt: Create a book cover for an isekai fantasy novel titled “Fleet Don’t Fail Me Now!” with the subtitle “Reborn In A Fantasy Hornblower Knockoff As The Admiral Of A Pirate Armada, My Monster-Girl-Harem Ship Captains Help Me Rule The Seas”. Include the male hero and his harem of monster girls. Use the entire image area for the cover. Make the image highly detailed and ultra textured. Precisely render the exact text of the title and subtitle in fantasy fonts.
Negative prompt: Create a 3D rendering of a book. Make the characters blurry or low-resolution. Include any text other than the title and subtitle. Show the book spine.
I think this one turned out the best:
Here’s one that got hit with the 3D-rendered-book problem:
Imitation-American-Isekai style:
Wrong-genre savvy:
Non-monster anime gals:
Western artist who doesn’t know what “isekai” means:
Western artist tracing Hollywood actors:
That guy from the Eighties who should never have been a cover artist:
No fleet, just a demonic harem:
I’d buy this just based on the cover art:
Diffusion fail (character in center started out a woman, and as the engine made multiple passes, gradually turned male; it really shows):
I’m open to having a female harem lead:
This one was a happy accident that came from trying a low step count with a high CFG. It’s not usable as-is, but it’s got a unique style that you could do something with, if you can replicate it:
Now for a few that I liked even though they didn’t get the title quite right:
Total title failures, but interesting:
Hmmm, how about the patriotic t-shirt I asked DALL-E and Grok for a while back?
First try didn’t suck:
Added a background to the prompt, and the third try did really well on the text:
I’m willing to accept this as body paint:
Happy with the general idea, I asked for a few… refinements:
Related, I’m still pleased with the first fake isekai book cover that I did by hand in Illustrator, but that was actual work. I think I spent about two hours laying out the text and tweaking the size and placement of the image, and another half-hour or so coming up with the set of volume titles. By the way, I was disappointed that nobody ever asked where I got my fake author name from. Mayakashi (no) Harikata = “make-believe dildo”.
I didn’t put nearly as much effort into The Grooming Of The Tanuki Child Bride; it was basically just swiping a dynamic fan-pic of Raphtalia and replicating the style of the series title with different kanji.
Markdown formatting and simple HTML accepted.
Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.