July 12th, 2024

AuraFlow v0.1: a open source alternative to Stable Diffusion 3

AuraFlow v0.1 is an open-source large rectified flow model for text-to-image generation. Developed to boost transparency and collaboration in AI, it optimizes training efficiency and achieves notable advancements.

Read original articleLink Icon
AuraFlow v0.1: a open source alternative to Stable Diffusion 3

AuraFlow v0.1 is introduced as an open-source large rectified flow model capable of text-to-image generation. Developed in response to a perceived slowdown in open-source AI model development, AuraFlow aims to revitalize the community's commitment to transparency and collaboration. The model, a result of collaboration with researcher Simo, incorporates optimizations like removing unnecessary layers and utilizing torch.compile for enhanced training efficiency. With a focus on zero-shot learning rate transfer and improved captioning techniques, AuraFlow achieves significant advancements in text-to-image generation. The model's architecture, featuring 6.8B parameters, underwent extensive training resulting in a GenEval score of 0.703. Future plans include ongoing model training, exploration of smaller, more efficient versions, and fostering community contributions. AuraFlow's release signifies a robust commitment to advancing AI development through shared knowledge and innovation.

Related

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.

Show HN: AI assisted image editing with audio instructions

Show HN: AI assisted image editing with audio instructions

The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.

Apple just launched a public demo of its '4M' AI model

Apple just launched a public demo of its '4M' AI model

Apple publicly launches its '4M' AI model with EPFL on Hugging Face Spaces, showcasing versatile capabilities across modalities. The move signals a shift towards transparency, aligning with market growth and emphasizing user privacy amid ethical concerns.

Show HN: txtai: open-source, production-focused vector search and RAG

Show HN: txtai: open-source, production-focused vector search and RAG

The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

A new attention mechanism, FlashAttention-3, boosts Transformer speed and accuracy on Hopper GPUs by up to 75%. Leveraging asynchrony and low-precision computing, it achieves 1.5-2x faster processing, utilizing FP8 for quicker computations and reduced costs. FlashAttention-3 optimizes for new hardware features, enhancing efficiency and AI capabilities. Integration into PyTorch is planned.

Link Icon 7 comments
By @viraptor - 3 months
Passes the "woman on grass" test ; - )

Seriously though, there are some minor hand issues and a rare missing body part. "Correct anatomy, no missing body parts." seems to fix it mostly. Still pretty good for an early 0.1 announcement.

Following full sentences is pretty good. Although this: "A photo of a table. On the table there's a green box on the right, a red ball on the left. There's a yellow cone on the box." keeps putting the cone on the table.

Not trained on naked bodies though - generates blob monsters instead.

By @smusamashah - 3 months
Prompt adherence is great. I copied a few prompts from ideogram (which also adheres to prompt) and results were good until they involve female bodies. This for example https://ideogram.ai/g/ENMWd7PrQ32dIWSF91uMJQ/2 comes out exposing that training didn't have enough naked bodies. Prompt adherence is very very good otherwise. Can try top images of the day/hour from ideogram to test.
By @halr9000 - 3 months
In case you missed it, the authors were pretty smart to include that folded section in the middle, "Prompt for prompt-enhancement". I slapped that into gpt (https://chatgpt.com/share/2e53403e-4bd7-4138-ac34-55378e2ed3...) and made a few prompts. Ran those on their online demo. Initial impressions:

  - prompt adherence is really good
  - it's somewhere between SD15 and SDXL at creating pictures of text 
  - aesthetic quality is good, but leaves some to be desired
Gonna play more with it in ComfyUI.
By @executesorder66 - 3 months
AIs are still not able to understand negations.

Try "ramen without egg" or "ramen with no egg" and it will show ramen WITH egg.

Or "man without striped shirt" will give "man WITH striped shirt"

By @skybrian - 3 months
Fails on “piano keyboard” (shows a full piano) and “close up of piano keyboard,” (bizarre duplicate keyboard monstrosity.)

It’s a difficult prompt. Nobody gets the grouping of black keys right. Maybe someday?

By @stale2002 - 3 months
So, now that this is released are we no longer going to have pendant people complaining that this "isn't real open source"?

Here is your model, complainers.

I'm not really sure why you'd be so insistent on that, as opposed to just fine tuning the "totally not open source, but instead just open weights" models.

But go ahead, I guess.

Now we can get back to talking about capabilities, usage, and results, as opposed to arguing about the definition of words.