August 1st, 2024

The open weight Flux text to image model is next level

Black Forest Labs has launched Flux, the largest open-source text-to-image model with 12 billion parameters, available in three versions. It features enhanced image quality and speed, alongside the release of AuraSR V2.

Read original article

FrustrationExcitementSkepticism

The open weight Flux text to image model is next level

Black Forest Labs has announced Flux, the largest open-source text-to-image model to date, featuring 12 billion parameters. This model aims to enhance creativity and performance, producing images with aesthetics similar to Midjourney. Flux is available in three variations: FLUX.1 [dev], an open-sourced base model under a non-commercial license; FLUX.1 [schnell], a distilled version that operates up to ten times faster and is licensed under Apache 2; and FLUX.1 [pro], a closed-source version accessible only through an API. The integration of a new inference engine allows Flux models to run up to twice as fast compared to previous methods, ensuring high-quality outputs. Key features of Flux include enhanced image quality, advanced human anatomy and photorealism, improved adherence to prompts, and exceptional speed, particularly with the Schnell variant. Users are encouraged to explore Flux through the provided playgrounds and API documentation. Additionally, the announcement mentions the release of AuraSR V2, an upgraded version of a single-step GAN upscaler, which follows the positive community response to its predecessor. AuraSR is based on the Adobe Gigagan paper and is designed to upscale low-resolution images significantly. The overall message emphasizes the ongoing development and potential of open-source AI models, countering claims that such initiatives are declining.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

A new attention mechanism, FlashAttention-3, boosts Transformer speed and accuracy on Hopper GPUs by up to 75%. Leveraging asynchrony and low-precision computing, it achieves 1.5-2x faster processing, utilizing FP8 for quicker computations and reduced costs. FlashAttention-3 optimizes for new hardware features, enhancing efficiency and AI capabilities. Integration into PyTorch is planned.

AuraFlow v0.1: a open source alternative to Stable Diffusion 3

AuraFlow v0.1 is an open-source large rectified flow model for text-to-image generation. Developed to boost transparency and collaboration in AI, it optimizes training efficiency and achieves notable advancements.

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Meta has launched Llama 3.1 405B, a free AI language model with 405 billion parameters, challenging closed AI models. Users can download it for personal use, promoting open-source AI principles. Mark Zuckerberg endorses this move.

Diffusion Training from Scratch on a Micro-Budget

The paper presents a cost-effective method for training text-to-image generative models by masking image patches and using synthetic images, achieving competitive performance at significantly lower costs.

Stable Fast 3D: Rapid 3D Asset Generation from Single Images

Stability AI launched Stable Fast 3D, a model generating high-quality 3D assets from images in 0.5 seconds, suitable for various industries. It offers rapid prototyping and is available on Hugging Face.

AI: What people are saying

The launch of Flux, the new text-to-image model by Black Forest Labs, has generated a variety of reactions and discussions among users.

Users express mixed feelings about the model's ability to generate complex prompts, with some noting failures in accurately depicting spatial relationships and specific scenarios.
Many users praise the image quality and speed of Flux, highlighting its potential for local use and comparisons to other models like DALL-E 3 and MidJourney.
Concerns are raised about the model's licensing and the sustainability of open-source projects, with some questioning the business model behind such offerings.
Several comments focus on the technical aspects, including the model's performance on different hardware and its ability to handle text rendering.
Users share links to try the model and discuss their experiences, indicating a strong interest in exploring its capabilities further.

49 comments

By @burkaygur - 9 months

hi friends! burkay from fal.ai here. would like to clarify that the model is NOT built by fal. all credit should go to Black Forest Labs (https://blackforestlabs.ai/) which is a new co by the OG stable diffusion team.

what we did at fal is take the model and run it on our inference engine optimized to run these kinds of models really really fast. feel free to give it a shot on the playgrounds. https://fal.ai/models/fal-ai/flux/dev

By @minimaxir - 9 months

The [schnell] model variant is Apache-licensed and is open sourced on Hugging Face: https://huggingface.co/black-forest-labs/FLUX.1-schnell

It is very fast and very good at rendering text, and appears to have a text encoder such that the model can handle both text and positioning much better: https://x.com/minimaxir/status/1819041076872908894

A fun consequence of better text rendering is that it means text watermarks from its training data appear more clearly: https://x.com/minimaxir/status/1819045012166127921

By @treesciencebot - 9 months

You can try the models here:

(available without sign-in) FLUX.1 [schnell] (Apache 2.0, open weights, step distilled): https://fal.ai/models/fal-ai/flux/schnell

(requires sign-in) FLUX.1 [dev] (non-commercial, open weights, guidance distilled): https://fal.ai/models/fal-ai/flux/dev

FLUX.1 [pro] (closed source [only available thru APIs], SOTA, raw): https://fal.ai/models/fal-ai/flux-pro

By @smusamashah - 9 months

Tested it using prompts from ideogram (login walled) which has great prompt adherence. Flux generated very very good images. I have been playing with ideogram but i don't want their filters and want to have a similar powerful system running locally.

If this runs locally, this is very very close to that in terms of both image quality and prompt adherence.

I did fail at writing text clearly when text was a bit complicated. This ideogram image's prompt for example https://ideogram.ai/g/GUw6Vo-tQ8eRWp9x2HONdA/0

> A captivating and artistic illustration of four distinct creative quarters, each representing a unique aspect of creativity. In the top left, a writer with a quill and inkpot is depicted, showcasing their struggle with the text "THE STRUGGLE IS NOT REAL 1: WRITER". The scene is comically portrayed, highlighting the writer's creative challenges. In the top right, a figure labeled "THE STRUGGLE IS NOT REAL 2: COPY ||PASTER" is accompanied by a humorous comic drawing that satirically demonstrates their approach. In the bottom left, "THE STRUGGLE IS NOT REAL 3: THE RETRIER" features a character retrieving items, complete with an entertaining comic illustration. Lastly, in the bottom right, a remixer, identified as "THE STRUGGLE IS NOT REAL 4: THE REMI

Otherwise, the quality is great. I stopped using stable diffusion long time ago, the tools and tech around it became very messy, its not fun anymore. Been using ideogram for fun but I want something like ideogram that I can run locally without any filters. This is looking perfect so far.

This is not ideogram, but its very very good.

By @seveibar - 9 months

whenever I see a new model I always see if it can do engineering diagrams (e.g. "two square boxes at a distance of 3.5mm"), still no dice on this one. https://x.com/seveibar/status/1819081632575611279

Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)

By @tantalor - 9 months

Seems to do pretty poorly with spatial relationships.

"An upside down house" -> regular old house

"A horse sitting on a dog" -> horse and dog next to eachother

"An inverted Lockheed Martin F-22 Raptor" -> yikes https://fal.media/files/koala/zgPYG6SqhD4Y3y_E9MONu.png

By @PoignardAzur - 9 months

Am I missing something? The beach image they give still fails to follow the prompt in major ways.

By @SV_BubbleTime - 9 months

Wow.

I have seen a lot of promises made by diffusion models.

This is in a whole different world. I legitimately feel bad for the people still a StabilityAI.

The playground testing is really something else!

The licensing model isn’t bad, although I would like to see them promise to open up their old closed source models under Apache when they release new API versions.

The prompt adherence and the breadth of topics it seems to know without a finetune and without any LORAs, is really amazing.

By @Havoc - 9 months

Bit annoying signup...Github only...and github account creation is currently broken "Something went wrong". Took two tries and two browsers...

By @vunderba - 9 months

Vast majority of comparisons aren't really putting these new models through their paces.

The best prompt adherence on the market right now BY FAR is DALL-E 3 but it still falls down on more complicated concepts and obviously is hugely censored - though weirdly significantly less censored if you hit their API directly.

I quickly mocked up a few weird/complex prompts and did some side-by-side comparisons with Flux and DALL-E 3. Flux is impressive and significantly performant particularly since both the dev/shnell models have been confirmed by Black Forest to be runnable via ComfyUI.

https://mordenstar.com/blog/flux-comparisons

By @Der_Einzige - 9 months

How long until nsfw fine tunes? Don’t pretend like it’s not on all of y’all’s minds, since over half of all the models on Civit.ai are NSFW. That’s what folks in the real world actually do with these models.

By @throwoutway - 9 months

> Nearby, anthropomorphic fruits play beach volleyball.

This is missing from the image. The generated image looks well, but while reading the prompt I was surpised it was missing

By @fl0id - 9 months

Mmmh, trying my recent test prompts, still pretty shit. F.e. whereas midjourney or SD do not have a problem to create a pencil sketch, with this model (pro), it always looks more like a black and white photograph or digital illustration or render. It is also like all the others apparently not able to follow instructions on the position of characters. (i.e. X and Y are turned away from each other).

By @viraptor - 9 months

Censored a bit, but not completely. I can get occasional boobs out of it, but sometimes it just gives the black output.

By @yjftsjthsd-h - 9 months

> FLUX.1 [dev]: The base model, open-sourced with a non-commercial license

...then it's not open source. At least the others are Apache 2.0 (real open source) and correctly labeled proprietary, respectively.

By @UncleOxidant - 9 months

Other flux ai things: https://fluxml.ai/ , https://www.flux.ai

By @cwoolfe - 9 months

Hey, great work over at fal.ai to run this on your infrastructure and for building in a free $2 in credits to try before buying. For those thinking of running this at home, I'll save you the trouble. Black Forest Flux did not run easily on my Apple Silicon MacBook at this time. (Please let me know if you have gotten this to run for you on similar hardware.) Specifically, it falls back to using CPU which is very slow. Changing device to 'mps' causes error "BFloat16 is not supported on MPS"

By @zarmin - 9 months

WILD

Photo of teen girl in a ski mask making an origami swan in a barn. There is caption on the bottom of the image: "EAT DRUGS" in yellow font. In the background there is a framed photo of obama

https://i.imgur.com/RifcWZc.png

Donald Trump on the cover of "Leopards Ate My Face" magazine

https://i.imgur.com/6HdBJkr.png

By @TechDebtDevin - 9 months

Damn this is actually really good.

By @CuriouslyC - 9 months

HF page: https://huggingface.co/black-forest-labs

By @ZoomerCretin - 9 months

Anyone know why text-to-image models have so many fewer parameters than text models? Are there any large image models (>70b, 400b, etc)?

By @dinobones - 9 months

I wonder if the key behind the quality of the MidJourney models, and this models, is less about size + architecture and more about the quality of images trained on.

It looks like this is the case for LLMs, that the training quality of the data has a significant impact on the output quality of the model, which makes sense.

So the real magic is in designing a system to curate that high quality data.

By @astrange - 9 months

Flux-schnell is still incapable of generating "horse riding an astronaut" or "upside-down mini cooper".

By @virtualritz - 9 months

I enter an elaborate prompt, press "Sign in to Run", sign in with my GH, get taken back to the previous page and my prompt text has reset to some default with no way to get back what I entered before.

Complete and utter UX/first impression fail. I had no desire to actualy try the model after this.

By @fngjdflmdflg - 9 months

Is the architecture outlined anywhere? Any publications or word on if they will publish something in the future? To be fair to them, they seemed to have launched this company today so I doubt they have a lot of time right now. Or maybe I just missed it?

By @Oras - 9 months

This is great and unbelievably fast! I noticed a small note saying how much this would cost and how many images you can create for $1.

I assume you’re offering this as an API? Would be nice to have pricing page as I didn’t see one on your website.

By @smusamashah - 9 months

Holy crap this is amazing. I saw an image with a prompt on reddit and didn't believe it was generated imaged. I thought it must be joke that people are sharing non-generated images in the thread.

Reddit message: https://www.reddit.com/r/StableDiffusion/comments/1ehh1hx/an...

Linked image: https://preview.redd.it/dz3djnish2gd1.png?width=1024&format=...

The prompt:

> Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.

Some of the images I generated using schnell model with 8-10 steps using this prompt. https://imgur.com/a/3mM9tKf

By @j1mmie - 9 months

I'm really impressed at its ability to output pixel art sprites. Maybe the best general-purpose model I've seen capable of that. In many cases its better than purpose-built models.

By @mlboss - 9 months

These venture funded startups keep releasing models for free without a business model in sight. I am all for open source but worry it is not sustainable long term.

By @NeckBeardPrince - 9 months

I think we have enough stuff out there called flux.

By @rty32 - 9 months

Just came to say I didn't see the "for homeless" part in that LEGO example. The prompt is a bit funny, almost ridiculous.

By @kennethwolters - 9 months

It is very good at "non-human subjects in photos with shallow focus".

Really curious to see what other low-hanging fruits people are finding.

By @robotnikman - 9 months

This is amazing! I thought it would be a few more years before we would have a such high quality model we could run locally.

By @SirMaster - 9 months

I tried: "Moe from The Simpsons, waving" several times. But it only ever drew Lisa from The Simpsons waving.

By @Noam45 - 9 months

I recently heard of FLUX and began reading about this. It's a remarkable technology

By @seu - 9 months

So I'm forced to signup and give my email for a supposed trial, only to be immediately told by email that I have a "Low Account Balance - Action Required"? Seriously?

By @xmly - 9 months

Nice one. Will it plan to support both text and image to image?

By @asadm - 9 months

This is actually really good! I fear much better than SD3 even!

By @vishalk_3 - 9 months

Great product. BTW I am new to this technology can you please tell me what is the parameter given to Model to make it look like real life image ?

By @jncfhnb - 9 months

Looks like a very promising model. Hope to see the comfyui community get it going quickly

By @EternalFury - 9 months

Impressive quality

By @mikejulietbravo - 9 months

What's the tl;dr on a difference from this to SD?

By @og_kalu - 9 months

You can try the models on replicate https://replicate.com/black-forest-labs.

Result (distilled schnell model) for

"Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it."

https://www.reddit.com/r/StableDiffusion/s/SsPeQRJIkw

By @bijutoha - 9 months

I'm really looking forward to exploring its capabilities and seeing how it compares to other models.

The open weight Flux text to image model is next level

Related

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

AuraFlow v0.1: a open source alternative to Stable Diffusion 3

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Diffusion Training from Scratch on a Micro-Budget

Stable Fast 3D: Rapid 3D Asset Generation from Single Images

Related

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

AuraFlow v0.1: a open source alternative to Stable Diffusion 3

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Diffusion Training from Scratch on a Micro-Budget

Stable Fast 3D: Rapid 3D Asset Generation from Single Images