November 24th, 2024

32k context length text embedding models

Voyage AI released two embedding models, voyage-3 and voyage-3-lite, enhancing retrieval quality, reducing costs, and supporting a 32K-token context length, outperforming OpenAI's models in various domains.

Read original article

32k context length text embedding models

Voyage AI has announced the release of two new embedding models, voyage-3 and voyage-3-lite, which aim to enhance retrieval quality while reducing costs and latency. The voyage-3 model outperforms OpenAI's v3 large model by an average of 7.55% across various domains, including technology, law, and finance, while being 2.2 times cheaper and having a smaller embedding dimension. The voyage-3-lite model also shows improved retrieval accuracy, outperforming OpenAI v3 large by 3.82% and costing significantly less. Both models support a context length of 32K tokens, which is four times greater than that of OpenAI. The development of these models involved advanced research techniques, including improved architecture and extensive pre-training. Users of general-purpose embedding models are encouraged to upgrade to voyage-3 for better performance or choose voyage-3-lite for cost savings. The models are evaluated across 40 domain-specific datasets, demonstrating competitive performance, particularly in multilingual contexts. Voyage AI continues to offer domain-specific models for specialized needs, while the new models provide a versatile option for broader applications.

- Voyage-3 outperforms OpenAI v3 large by 7.55% while being 2.2x cheaper.

- Voyage-3-lite offers 3.82% better accuracy than OpenAI v3 large at 6x lower cost.

- Both models support a 32K-token context length, significantly higher than competitors.

- The models are designed for general-purpose use but also complement domain-specific models.

- Users can access the first 200M tokens for free to test the new models.

OpenAI slashes the cost of using its AI with a "mini" model

OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.

Show HN: Keep Your Next Viral AI App Free for Longer with Local Embeddings

Function LLM is a new tool that allows developers to generate local embeddings, potentially saving up to 60% on OpenAI costs while enhancing privacy and requiring minimal implementation effort.

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Meta released Llama 3.2, featuring vision models with 11B and 90B parameters, and lightweight text models with 1B and 3B parameters, optimized for edge devices and supporting extensive deployment options.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

All-in-one embedding model for interleaved text, images, and screenshots

Voyage AI released voyage-multimodal-3, an embedding model that enhances retrieval accuracy by 19.63%, integrating text and images for improved performance in multimodal tasks, now available with 200 million free tokens.

9 comments

By @albert_e - 6 months

Related question:

One year ago simonw said this in a post about embeddings:

[https://news.ycombinator.com/item?id=37985489]

> Lots of startups are launching new “vector databases”—which are effectively databases that are custom built to answer nearest-neighbour queries against vectors as quickly as possible.

> I’m not convinced you need an entirely new database for this: I’m more excited about adding custom indexes to existing databases. For example, SQLite has sqlite-vss and PostgreSQL has pgvector.

Do we still feel specialized vector databases are an overkill?

We have AWS promoting amazon OpenSearch as the default vector database for a RAG knowledge base and that service is not cheap.

Also I would like to understand a bit more about how to pre-process and chunk the data properly in a way that optimizes the vector embeddings, storage and retrieval ... any good guides on the same i cna refer to? Thanks!

By @throwup238 - 6 months

What’s the benefit of generating embeddings for such large chunks? Do people use these large contexts to include lots of document specific headers/footers or are they actually generating embeddings of single large documents?

I don’t understand how the math works out on those vectors

By @dtjohnnyb - 6 months

I've found good results from summarizing my documents using a large context model then embedding those summaries using a standard embedding model (e.g. e5)

This way I can tune what aspects of the doc I want to focus retrieval on, it's easier to determine when there are any data quality issues that need to be fixed, and the summaries have turned out to be useful for other use cases in the company.

By @Oras - 6 months

Not related, but why they don’t have a pricing page? Last time I checked voyageai I had to google their pricing to find the page as it’s not in the nav menu.

By @johnfn - 6 months

What on earth is "OpenAI V3"? Just to be sure I wasn't being obtuse, I Googled it, only to get a bunch of articles pointing back at this post.

By @antirez - 6 months

I wonder if random projections or other similar dimensionality reduction techniques work equally well then using a model specialized in smaller embeddings capturing the same amount of semantical information. This way we could use larger embeddings of open models working very well and yet enjoy faster node to node similarity during searches.

By @ChrisArchitect - 6 months

https://hn.algolia.com/?q=https%3A%2F%2Fblog.voyageai.com%2F...

By @OutOfHere - 6 months

I would like to see an independent benchmark.

By @ldjkfkdsjnv - 6 months

I build a RAG system with voyage and it crushed openai embeddings, the difference in retrieval quality was noticeable