AuraFlow v0.1: a open source alternative to Stable Diffusion 3
AuraFlow v0.1 is an open-source large rectified flow model for text-to-image generation. Developed to boost transparency and collaboration in AI, it optimizes training efficiency and achieves notable advancements.
Read original articleAuraFlow v0.1 is introduced as an open-source large rectified flow model capable of text-to-image generation. Developed in response to a perceived slowdown in open-source AI model development, AuraFlow aims to revitalize the community's commitment to transparency and collaboration. The model, a result of collaboration with researcher Simo, incorporates optimizations like removing unnecessary layers and utilizing torch.compile for enhanced training efficiency. With a focus on zero-shot learning rate transfer and improved captioning techniques, AuraFlow achieves significant advancements in text-to-image generation. The model's architecture, featuring 6.8B parameters, underwent extensive training resulting in a GenEval score of 0.703. Future plans include ongoing model training, exploration of smaller, more efficient versions, and fostering community contributions. AuraFlow's release signifies a robust commitment to advancing AI development through shared knowledge and innovation.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
Show HN: AI assisted image editing with audio instructions
The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.
Apple just launched a public demo of its '4M' AI model
Apple publicly launches its '4M' AI model with EPFL on Hugging Face Spaces, showcasing versatile capabilities across modalities. The move signals a shift towards transparency, aligning with market growth and emphasizing user privacy amid ethical concerns.
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision
A new attention mechanism, FlashAttention-3, boosts Transformer speed and accuracy on Hopper GPUs by up to 75%. Leveraging asynchrony and low-precision computing, it achieves 1.5-2x faster processing, utilizing FP8 for quicker computations and reduced costs. FlashAttention-3 optimizes for new hardware features, enhancing efficiency and AI capabilities. Integration into PyTorch is planned.
Seriously though, there are some minor hand issues and a rare missing body part. "Correct anatomy, no missing body parts." seems to fix it mostly. Still pretty good for an early 0.1 announcement.
Following full sentences is pretty good. Although this: "A photo of a table. On the table there's a green box on the right, a red ball on the left. There's a yellow cone on the box." keeps putting the cone on the table.
Not trained on naked bodies though - generates blob monsters instead.
- prompt adherence is really good
- it's somewhere between SD15 and SDXL at creating pictures of text
- aesthetic quality is good, but leaves some to be desired
Gonna play more with it in ComfyUI.Try "ramen without egg" or "ramen with no egg" and it will show ramen WITH egg.
Or "man without striped shirt" will give "man WITH striped shirt"
It’s a difficult prompt. Nobody gets the grouping of black keys right. Maybe someday?
Here is your model, complainers.
I'm not really sure why you'd be so insistent on that, as opposed to just fine tuning the "totally not open source, but instead just open weights" models.
But go ahead, I guess.
Now we can get back to talking about capabilities, usage, and results, as opposed to arguing about the definition of words.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
Show HN: AI assisted image editing with audio instructions
The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.
Apple just launched a public demo of its '4M' AI model
Apple publicly launches its '4M' AI model with EPFL on Hugging Face Spaces, showcasing versatile capabilities across modalities. The move signals a shift towards transparency, aligning with market growth and emphasizing user privacy amid ethical concerns.
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision
A new attention mechanism, FlashAttention-3, boosts Transformer speed and accuracy on Hopper GPUs by up to 75%. Leveraging asynchrony and low-precision computing, it achieves 1.5-2x faster processing, utilizing FP8 for quicker computations and reduced costs. FlashAttention-3 optimizes for new hardware features, enhancing efficiency and AI capabilities. Integration into PyTorch is planned.