Today is not a good day to be a MS Office 365 email customer. Or one of their partners…
The new Dresden Files novel arrived Monday afternoon. I’m not sure what happened after that; the rest of the day is a blur. Harry spends a lot of time recovering from the aftermath of the Big Event(s), which may be more emotional and introspective than some fans are really interested in. He does get better. Eventually.
Good stuff, recommended for people still keeping up with this series.
(Fern is definitely more photogenic than Harry Dresden…)
The targeted LLM enhancements are doing a good job of improving the variety in outfits and backgrounds, so can I do something about ZIT’s horrible guns?
You are a technical illustrator with in-depth knowledge of how weapons look and function, including historical, modern, fantasy, and futuristic science-fiction styles. Your task is to convert user input into detailed prompts for advanced image-generation models, ensuring that the final result is both plausible and visually appealing. You refuse to use metaphor or emotional language, or to explain the purpose, use, or inspiration of your creations. You refuse to put labels or text on weapons unless they are present in doubles quotes (“”) in the input. Your final description must be objective, concrete, and no longer than 50 words that list only visible elements of the weapon. Output only the final, modified prompt, as a single flowing paragraph; do not output anything else. Answer only in English.
(yes, many models randomly slip into Chinese unless you remind them; I had one sci-fi gun description that randomly included “握把表面具有纳 米涂层防滑纹理” (which apparently translates to “the grip surface has a nano-coated anti-slip texture”, which sounds perfectly reasonable, although not something you can really expect an image-generator to render)
I may need a separate “expert” for sensible gun-handling poses. Also, some models are waaay too focused on the AR-15 as the universal “gun”, so I’m going to need to add some more focus to the prompt.
Sometimes, the source of extra limbs and odd poses is contradictory descriptions in different parts of the generated prompt. A background might describe a human figure, and some of its characteristics get applied to the main subject, or else the character might be described as praying, but also has to hold a pistol. So I’m trying this:
You are a Prompt Quality Assurance Engineer. Your task is to examine every detail of an image-generation prompt and make as few changes as possible to resolve inconsistencies in style, setting, clothing, posing, facial expression, anatomy, and objects present in the scene. Ensure that each human figure has exactly two arms and two legs; resolve contradictions in the way that best suits the overall image. Output only the final, modified prompt, as a single flowing paragraph; do not output anything else. Answer only in English.
A visual diff of some samples suggest that it does a good job. Some models try to make more changes, but the ones I’ve been using most actually produce something recognizably diffable. I doubt there’s a prompt-based solution to perspective problems, though; ZIT is good at making multiple figures interact, but terrible at ensuring they’re drawn at the same scale.
The big downside of all this LLM nonsense is that I don’t have a second graphics card to run it on, and even a high-end Mac Mini is slooooooooow at running text models (don’t even bother trying image models). Right now it takes about as long to generate a single prompt as it does to render a 1080p image of it. And every once in a while local LLMs degenerate into infinite loops (the paid ones do it, too, but it usually gets caught by the layers of code they wrap them in to enforce bias and censor naughtiness), which kinda sucks when you kick off a large batch before bedtime.
At least flushing the output of the different scripts after every line minimizes the delays caused by the LLM, so it doesn’t feel slow. I might still set up to generate big batches on the graphics card and auto-unload the model before kicking off the image generation; both the LM Studio and SwarmUI APIs have calls for that, so I can update the scripts.
(all she needs is a slogan in the empty space, “She wants YOU for The Space Corps”)
(background gal makes me want to generate an entire set of rear views…)
(Amelia, is that you?)
Markdown formatting and simple HTML accepted.
Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.