Cool Pictures from the Mind of a Machine: AI Generated Pictures

Flux is head and shoulders above others free models.

Flux = the future
SDXL = the past
SD3 = dead at birth
SD1 = the poor´s man choice
Pony = :groucho:

Sadly, while Flux is pretty good at faces and limbs it is pretty bad rendering the rest of the human anatomy, not only the 'private' parts, but also ribs, belly, back...

While this problem is fixed by the community I use a workflow combining Flux as base model, plus Pony for rendering anatomy, guided by some SAM masking.
I tried it on HuggingSpace, it doesn't seem to do artistic images very well
 
^Try to describe the image more in deep, unlike previous models, Flux seems to work better with longer natural language prompts. Use chatgpt for instance. I got the image below from the next prompt almost without manual retouching (only to complete the bow string)

"A painting in the style of the Conan the Barbarian comic book covers showing a female warrior in an infernal environment, surrounded by a rocky landscape with burning lava and an intense orange glow in the background. She is sweating profusely while showing a confident posture and ready for action. Her outfit is minimalist and tribal in style, composed of a kind of top and a loincloth made of dark gray fabric, complemented by bracelets and adorned belts that reinforce her appearance as a hunter or warrior. The woman has dark, wavy hair, which falls to her shoulders. She carries a bow in one hand and a quiver full of arrows on her back, suggesting that she is prepared to combat.


1724007835781.png


The workflow i used was pretty compex though with two different models in five steps with masks and such.

1724011646399.png
 
I think you also need to describe the style precisely, and artist names don't have as much impact as with previous versions.

To continue about following the prompt (I'm still re-using parts of my previous dynamic prompts dating from 1.5, it's faster for testing for me, so not fully natural langage, but even then it does interpret the prompt quite well)

blade runner 2049 style, (cyberpunk city with skyscrapers and huge 3D holographic ads), there is a geisha with a Coca-Cola bottle on the biggest ad, there are kanjis on the other ads, street racing, view of an intricate highly detailed futuristic cyberpunk (concept car:1.2) made by Maserati but based on a mix of a (Toyota 86 Subaru BRZ:0.70) and a (Ford GT40 1968:0.75) and a (Chevrolet Corvette L88 1968:0.80), in a cyberpunk city highway at high speed with motion blur and under heavy rain, raining, (water splash are projected behind the car), "Gedemon" is written in licence plate 3D letters on the car's plate, racing against other cyberpunk cars

20240816185343-Flux-UltimateUpscale_0001.jpeg



Now it also understand simple prompts, like "Civ8, when ?"

Spoiler :

20240819000843-Flux-UltimateUpscale_0001.jpeg


20240819001533-Flux-UltimateUpscale_0001.jpeg


20240819002115-Flux-UltimateUpscale_0001.jpeg


20240819005606-Flux-UltimateUpscale_0001.jpeg



It's not always perfect of course...

Some I didn't bother to upscale:

Spoiler :

20240819003038-Flux_01.jpeg


20240819003622-Flux_01.jpeg


20240819004930-Flux_01.jpeg


20240819004501-Flux_01.jpeg


20240819004745-Flux_01.jpeg

 
325318d393094b53a04df3dd7bb626a8.jpg


This feels like the sort of painting that when you look it up has a million art blog pages explaining what it means, but this one doesn't mean anything, it's just noise diffused by a machine
 
View attachment 701377

This feels like the sort of painting that when you look it up has a million art blog pages explaining what it means, but this one doesn't mean anything, it's just noise diffused by a machine
There is so much human-made art that looks like it has some deep meaning but means nothing in reality... Most of it I would say.
 
AI misinformation is only just beginning. :(
 
IMG-20240908-WA0009.jpg


No manual retouching at all. Complex Text to Image ComfyUI workflow though:
-First a Llama LLM is used through Ollama to strengthen and expand my concise original prompt to a long 500 tokens prompt and feed it to Flux, the LLM is also used to summarize its own prompt into a few tokens paragraph to feed the Flux clip-l, (and later the Pony's clip), obtaining a pretty rich 1 megapixel base image. Using Flux as base is important as it is best for prompt understanding, creativity and composition.
- The base image is then used as latent to get a Pony upscaled image, using Controlnet to prevent Pony going wild. Pony is still the best for secondary anatomical details and getting a natural skin feeling, but tend to go crazy easily and need additional guidance.
-Then face and hands are rendered again in Flux using SAM detector. Flux is amazing on face and hands anatomy, however saturation, brightness and contrast need to be adjusted as flux results tend to be darker and more saturated.
- Next, the image obtained is enormously upscaled using filters to add some false detail, sharpness and noise, then downscaled and feed to an Ultimate Upscale node using Pony again to get the final skin texture and fine details in a 8 megapixel image.

I think I have almost reached perfection with this workflow (on barbarian women at least):queen:
 
Last edited:
Top Bottom