October 3rd, 2024

Gemini 1.5 Flash-8B is now production ready

Google announced the production-ready Gemini 1.5 Flash-8B, featuring a 50% price reduction, double rate limits, and optimized performance for high-volume tasks, accessible for free via Google AI Studio.

Read original articleLink Icon
Gemini 1.5 Flash-8B is now production ready

Gemini 1.5 Flash-8B has been announced as production-ready by Google, featuring significant improvements over its predecessor, Gemini 1.5 Flash. The new model offers a 50% reduction in pricing, double the rate limits, and lower latency for small prompts. Developers can access this model for free through Google AI Studio and the Gemini API. The Flash-8B variant is designed for speed and efficiency, performing well in tasks such as chat, transcription, and long-context language translation. It is particularly suited for high-volume multimodal applications and long-context summarization tasks. The pricing structure for Gemini 1.5 Flash-8B is the lowest among all Gemini models, with costs set at $0.0375 per million input tokens and $0.15 per million output tokens for prompts under 128K. Developers on the paid tier will begin billing on October 14th. Additionally, the model allows for up to 4,000 requests per minute, enhancing its utility for developers. Google emphasizes its commitment to supporting developers in creating innovative products and services.

- Gemini 1.5 Flash-8B is now production-ready with improved performance and lower costs.

- The model features a 50% price reduction and double the rate limits compared to its predecessor.

- Developers can access the model for free via Google AI Studio and the Gemini API.

- The pricing structure is the lowest among all Gemini models, effective from October 14th for paid users.

- The model is optimized for high-volume tasks and performs well in various applications.

Link Icon 5 comments
By @bearjaws - 7 months
Damn I literally just published a article benchmarking flash-1.5 and showing it is very impressive for it's cost.

https://myswamp.substack.com/p/improving-accessibility-using...

Maybe I'll redo it and add in 1.5-8b, it's so cheap it doesn't hurt to add it lol.

By @Alifatisk - 7 months
Why do some people turn to Gemini? I've tried it, and I remember it lacking or being heavily censored. Is it because it's cheap? Or is it better at some tasks that others aren't?
By @faangguyindia - 7 months
It's such a shame, zed editor cannot use Gemini Flash for code completion, it's stuck on Supermaven or copilot.

Most editors can easily support LLMs via Fill in Middle operation mode

By @Havoc - 7 months
Does anyone know if the rate limits on Flash and Flash8B are separate?