June 25th, 2024

How to generate realistic people in Stable Diffusion

The tutorial focuses on creating lifelike portrait images using Stable Diffusion. It covers prompts, lighting, facial details, blending faces, poses, and models like F222 and Hassan Blend 1.4 for realistic results. Emphasis on clothing terms and model licenses is highlighted.

Read original articleLink Icon
How to generate realistic people in Stable Diffusion

In the tutorial on generating realistic people using Stable Diffusion, the focus is on creating lifelike portrait images. The process involves building high-quality prompts, incorporating lighting and camera keywords, and enhancing facial details for a more realistic appearance. Techniques like blending faces, controlling poses, and inpainting are discussed to refine the generated images. Additionally, specific models like F222, Hassan Blend 1.4, Realistic Vision v2.0, Chillout Mix, Dreamlike Photoreal, and URPM are introduced for generating realistic images with varying features and styles. The tutorial emphasizes the importance of using clothing terms to avoid explicit content and highlights the need to read and adhere to model licenses. Readers are encouraged to experiment with different models and techniques to achieve desired results in generating realistic people through Stable Diffusion.

Related

Unique3D: Image-to-3D Generation from a Single Image

Unique3D: Image-to-3D Generation from a Single Image

The GitHub repository hosts Unique3D, offering efficient 3D mesh generation from a single image. It includes author details, project specifics, setup guides for Linux and Windows, an interactive demo, ComfyUI, tips, acknowledgements, collaborations, and citations.

Show HN: Feedback on Sketch Colourisation

Show HN: Feedback on Sketch Colourisation

The GitHub repository contains SketchDeco, a project for colorizing black and white sketches without training. It includes setup instructions, usage guidelines, acknowledgments, and future plans. Users can seek support if needed.

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

SceneCraft is an advanced Large Language Model (LLM) Agent converting text to 3D scenes in Blender. It excels in spatial planning, asset arrangement, and scene refinement, surpassing other LLM agents in performance and human feedback.

HybridNeRF: Efficient Neural Rendering

HybridNeRF: Efficient Neural Rendering

HybridNeRF combines surface and volumetric representations for efficient neural rendering, achieving 15-30% error rate improvement over baselines. It enables real-time framerates of 36 FPS at 2K×2K resolutions, outperforming VR-NeRF in quality and speed on various datasets.

Homegrown Rendering with Rust

Homegrown Rendering with Rust

Embark Studios develops a creative platform for user-generated content, emphasizing gameplay over graphics. They leverage Rust for 3D rendering, introducing the experimental "kajiya" renderer for learning purposes. The team aims to simplify rendering for user-generated content, utilizing Vulkan API and Rust's versatility for GPU programming. They seek to enhance Rust's ecosystem for GPU programming.

Link Icon 26 comments
By @zevv - 5 months
I might be going around in the wrong social circles, but none of the people I know look anything like the realistic people in these images. Are these models even able to generate pictures of actual normal everyday people instead of glossy photo models and celebrity lookalikes?
By @moritzwarhier - 5 months
Generating fake portrait photos seems kind of boring.

Wouldn't these kinds of negative prompts and tweaking break down if I wanted to plug in more varied descriptions of people?

I find it interesting to plug in colorful descriptions of person's traits from a novel for example, or of people actually doing something.

Using "ugly", "disfigured" as negative prompt probably wouldn't work then...

For the pictures in the article, my first association is someone generating romance scam profile pictures, not art.

By @Animats - 5 months
Try using the "I Can't Believe It's Not Photography" model. Instead of trying to micro-manage the details, use strong emotional terms. I've had good results with prompts along the lines of "Aroused angry feral Asian woman wearing a crop top riding a motorcycle fast in monsoon rain."[1]

[1] https://i.ibb.co/3zHGyrR/feral34.png

By @ginko - 5 months
Why does everything generated by SD seem to have this weird plasticky sheen to it? Is that a preference of people generating these or innate to the model?
By @supriyo-biswas - 5 months
It seems to be a requirement to have a model trained on a large number of explicit images to generate correct anatomy.

While people have tried going from a base model to a fine tuned model based on explicit images, I wonder if there are people are attempting to go the other way round (train a base model on explicit photographs and other images not involving humans; then fine-tune away the explicit parts), which might lead to better results?

By @xg15 - 5 months
> Caution - Nearly all of [the special-purpose models] are prone to generating explicit images. Use clothing terms like dress in the prompt and nude in the negative prompt to suppress them.

I like how even with all the "please don't make it porn" terms in the prompt, you can easily see (by choice of dresses, cleavage, pose, facial expressions etc) which models "want" to generate porn and are barely held back by the prompt.

By @BobbyTables2 - 5 months
I find with stable diffusion, images for a type of prompt seem to all show the same person. Add something like “mature” to the prompt and you get a different (but same) person for all those images, regardless of seed.

When one asks prompts for which it hasn’t seen in training data, the results start to look less realistic.

Have even seen adult video logos in generated images.

I very much strongly suspect AI is not what we think.

By @nuz - 5 months
Use SDXL instead of sd1.5. Also don't do negative prompting on "distorted, ugly" because everyone follows these guides blindly and the result is the same types of mode collapsed faces which people have learned to recognize as AI. Use loras instead (to make it look slightly amateurish, or just away from the cookie cutter AI style which everyone notices).
By @roenxi - 5 months
This is stable diffusion 1.5. https://civitai.com/ and https://huggingface.co/ suggest the popular options are SDXL based - it is a much better model (effectively SD 2.5). Still imperfect, but much better.
By @throwaway1571 - 5 months
I find all those photos generated by Stable Diffusion to be kind of repetitious and boring.

Eking out something "interesting" is difficult, especially with limited time and low-end hardware. Interesting is highly subjective of course. I tend towards the more artistic / surrealist style, usually NSFW. Only nudes, no pornography.

I've been experimenting these last few months with interesting generating images, trying to make them "artistic" rather than photo-realistic, or the usual bland anime tributes.

I usually pick a "classical" artist which already has nudes in their repertoire, and try to blend their style with some photos I take myself, and with the style of other artists.

Most fall flat, some come close to what I consider acceptable, but still have major flaws. However, due to my time and hardware constraints they're good enough to post. I use fooocus which is kind of limiting, but after trying and failing to produce satisfactory results with Automatic, fooocus is just what I needed.

I can't really understand why more people don't do the same. Stable Diffusion was trained on a long and diverse list of artists, but most people seem to disregard that and focus only on anime or realistic photographs. The internet is inundated with those. I'm following some people on Mastodon who post more interesting stuff, but they usually tend to be all same-ish. I try to produce more diverse stuff, but most of the time it feels like going against the grain.

The women still tend to look like unrealistic supermodels. Sometimes this is what I want. Sometimes not, and it takes many tweaks to make them normal women, and usually I can't spare the time. Which is unfortunate.

If anyone's interested, I post the somewhat better experiments in:

https://mastodon.social/@TheNudeSurrealist

Warning: Most are NSFW. But are NSFW in the way Titian's Venus, say, is NSFW.

By @coreyh14444 - 5 months
Impressive breakdown, but this is six months old.
By @9dev - 5 months
If your goal is to generate content for a fully fictional celebrity magazine, this article will help you.

How come this technology appears to be exclusively used to generate fake pictures of unrealistically good-looking women? And to what end..?

By @Havoc - 5 months
Step 1: Don’t use stability’s lastest model
By @cubefox - 5 months
Here it may be more reasonable to actually pay some money for commercial models that are far ahead of Stable Diffusion in terms of image quality and prompt understanding. Like Dall-E 3, Imagen 2 (Imagen 3 comes out soon), or Midjourney. The gap between free and commercial diffusion models seems to be larger than the gap between free and commercial LLMs.
By @aranelsurion - 5 months
Wanted to give it a try just for fun, using the same prompts, base model and parameters (as far as I can tell), and the first 5 images that were created... will probably haunt me in my dreams tonight.

I don't know if it was me misconfiguring it, or if the images in post were really cherry-picked.

By @antihero - 5 months
Scrolling through the article the pictures look no more realistic as it goes on.

You need to simulate poor lighting, dirt, soul, realistic beauty etc. Perhaps even situations that give a reason for a photo to be taken other than I’m a basic heteronormative woman who is attractive.

By @efilife - 5 months
The images generated by this guy are nowhere close to realistic. The resolution he's using is terrible for getting realisting faces. Most people with better GPUs get way better results whlist using 10% of the tricks from the article
By @virtualritz - 5 months
It's kinda telling when the author says (e.g. about the "Realistic Vision v2" model) that "the anatomy is excellent [...]" when this is obviously not the case.

Actually it is in no single image in that blog post.

If you have a trained eye that is.

By @nubinetwork - 5 months
I have a hard time believing that the huge prompt they used at the end (before img2img) will fit in diffusers prompts. I noticed that after 75 tokens or so, it just chops off the prompt and runs with whatever didn't get cut.
By @jredwards - 5 months
I've spent a ton of time playing with Stable Diffusion, just for amusement. I've rarely found it interesting to generate realistic people.
By @siilats - 5 months
Missing reference to dreambooth
By @pandemic_region - 5 months
Scary, that's all I can say.