Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data
Trellis is an AI-powered ETL tool that converts unstructured data into structured SQL formats, addressing enterprise data management challenges, particularly in financial services, using advanced AI techniques for optimization.
Trellis, founded by Jacky and Mac, is an AI-powered ETL tool designed to convert unstructured data, such as phone calls, PDFs, and chats, into structured SQL formats based on user-defined schemas. This innovation aims to assist data and operations teams in automating manual data entry and executing SQL queries on disorganized data. The founders, who met at the Stanford AI lab, identified a significant challenge in enterprise data management: 80% of enterprise data is unstructured, which traditional platforms struggle to process. Trellis addresses this issue by utilizing advanced techniques, including LLM-based map-reduce for long documents and model routing to optimize transformation processes. The tool has seen applications in various sectors, particularly in financial services, where it helps streamline the processing of complex documents and enhances operational efficiency. Users can explore a demo and a showcase featuring an analysis of Enron emails, highlighting Trellis's capabilities. The founders invite feedback and offer integration options for interested users, emphasizing their commitment to improving workflows related to unstructured data.
- Trellis transforms unstructured data into structured SQL formats.
- The tool addresses the challenge of managing 80% of enterprise data that is unstructured.
- It employs advanced AI techniques for document processing and model optimization.
- Applications are particularly notable in financial services and customer support.
- Users can access demos and integration options to explore Trellis's capabilities.
Related
Trellis (YC W24) is hiring engineer to build AI-powered ETL for unstructured data
Trellis, a startup backed by Y Combinator, General Catalyst, and investors from Google, Salesforce, and JP Morgan Chase, seeks a Founding Engineer. The role involves developing AI-powered data infrastructure and requires skills in Python, Go, ML/NLP, and cloud technologies. Founded in 2023, Trellis offers opportunities in cutting-edge AI and data projects.
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
txtai: Open-source vector search and RAG for minimalists
txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.
Trellis (YC W24) is hiring engineer to build AI-powered ETL for unstructured data
Trellis, a startup backed by Y Combinator, seeks a Founding Engineer for backend and ML infrastructure. They aim to create an AI-powered Snowflake for unstructured data, offering opportunities in pioneering AI, data infrastructure, and database development.
Trellis (YC W24) is hiring eng to build AI workflows for unstructured data
Trellis, a Y Combinator-backed startup, seeks a founding engineer for its machine learning team, offering a salary of $110K-$225K and equity. Candidates need experience in full-stack development and relevant technologies.
- Many users express excitement about the tool's ability to extract data from unstructured sources like PDFs, emphasizing its value in sectors like finance and healthcare.
- Several commenters share their own experiences with similar technologies, discussing challenges such as accuracy and the need for manual review.
- Concerns are raised about competition and the sustainability of Trellis's business model in a rapidly evolving market.
- Questions about the tool's accuracy, integration capabilities, and compliance with regulations like HIPAA are prevalent.
- Overall, there is a mix of enthusiasm for the innovation and skepticism regarding its practical implementation and market fit.
Here's an example of the Enron email demo using the edsl syntax/package & a few different LLMs: https://www.expectedparrot.com/content/6607caa1-efc5-439f-85...
Great use case! Worked on exactly this a decade ago. It was Hard™ then. Could only make so much progress. Getting this right is a huge value unlock. Congrats!
Basically when we onboard a new client they dump all their audiograms on us as PDFs.
The data needs extraction needs to be perfect because the tables values are used to detect hearing loss over time.
We settled on a pipeline that looks roughly like
PDF -> gpto pre filter phase -> OCR to extract text tables and forms -> things branch out here
We do a direct parse of forms and text through an LLM
Extract audiogram graphs and send them to a foundation convnet
Attempt to parse tables programmatically
-> an audiogram might have 3 separate places where the values are so we pass the results of all three of these routes through Claude sonnet and if they match they get auto approved. If they don’t, they get flagged for manual review.
All in all it’s been a journey but the accuracy is near 100 percent. These tools are incredible
Trellis looks amazing... but only if it works well enough, i.e., if the rate of edge cases that trip up the service consistently remains close to 0%.
Every organization in the world needs and wants this, like, right now.
If you make it work well enough, you'll have customers knocking on your door around the clock.
I'm going to take a look. Like others here, I'm rooting for you guys to succeed.
Everyone here knows that it's a really big problem that no one has nailed yet.
My 2 cents:
1. It took us (newscatcherapi.com) three years to realize that customers with the biggest problems and with the biggest budgets are the most underserved. The reason is that everyone is building an infinitely scalable AI/LLM/whatever to gain insights from news.
In reality, this NLP/AI works quite OK out of the box but is not ideal for everyone at the same time. So we decided to do Palantir-like onboarding/integration for each customer. We charge 25x more, but customers have a perfect tailor-made solution and a high ROI.
I see you already do the same! "99%+ accuracy with fine-tuning and human-in-the-loop" is what worked great for us. This way, your competitor is a human on payroll (very expensive) and not AWS Tesseract.
Going from 95% to 99% is just a fractional improvement, but it can be "not good enough" to a "great solution" change that can be charged differently.
2. "AI-powered workflow for unstructured data" what does it even mean? Why don't you say "99%+ accuracy extraction"? It's 2024, everyone is using AI, and everyone knows you need 2 hours to start applying AI from 0. So don't lower my expectations.
I used OpenAI's function calling (via Langchain's https://python.langchain.com/v0.1/docs/modules/model_io/chat... API).
Some of the challenges I had:
1. poor recall for some fields, even with a wide variety of input document formats
2. needing to experiment with the json schema (particularly field descriptions) to get the best info out, and ignore superfluous information
3. for each long document, deciding whether to send the whole document in the context, or only the most relevant chunks (using traditional text search and semantic vector search)
4. poor quality OCR
From the demo video, it seems like your main innovation is allowing a non-technical user to do #2 in an iterative fashion. Have I understood correctly?
Rooting for you guys!
Filters are a really important feature downstream of that which this system can provide.
We have also worked with the Enron corpus for demos and fast, reliable ETL for a set of documents that large is more difficult than it seems and a commendable problem to solve.
Exciting stuff!
For instance, I have 100 pdfs, each with 10-100 individual products listed (in different formats).
I want to create a single table with one row per product appearing in any of the PDFs, with various details like price, product description, etc.,
From what I can tell from the demo, it seems like 1 file = 1 row in Trellis?
How do your capabilities compare to Google Document AI or Watson SDU? Also what about standalone competitors such as Indico Data or DocuPanda?
I'm curious, have you (or your customers) deployed this in a RAG use case already, and what have been the results like?
non-snarky genuine question: is "generate structured data from unstructured data using AI" intended to be a moat or differentiator?
catalyst for my question: I just read about this capability becoming available from other AI vendors, e.g.
https://openai.com/index/introducing-structured-outputs-in-t...
(congrats on the launch!)
You guys came out of an academic lab, so you must know that hypothesis fishing expeditions are not viable.
> ... a major commercial bank... couldn’t improve credit risk models because critical data was stuck in PDFs and emails.
In this example there will be no improvement to the risk model or whatever, because 19/20 times there will be no improvement. In an academic setting this is seen as normal, but in a business setting with no executive champions, only product managers, this will be seen as a failure, and it will be associated with you and your technology, which is bad.
Unfortunately these people are not willing to pay more money for less risk. What they want is a base consulting cost (i.e., a non-venture business) to identify the lowest risk, promotion worthy endeavor, and then they want to pay as little as possible to achieve that. In a sense, the kind of customers who need unstructured data ETLs are poorly positioned to use such a technology, because they don't value technology generally, they aren't forward looking.
Assembling attractive websites that are really features on top of Dagster? There's a lot of value in that. Question is, are people willing to pay for that? Anyone can make attractive Dagster UIs, anyone can do Python glue. It's very challenging to differentiate yourselves, even when you feel like you have some customers, because eventually, one of those middlemen at BankCo are going to punch your USP into Google, and find the pre-existing services with huge account management teams (i.e., the hand holding consulting business people really pay for) that outpace you.
Because browsers have an autocomplete feature.
Related
Trellis (YC W24) is hiring engineer to build AI-powered ETL for unstructured data
Trellis, a startup backed by Y Combinator, General Catalyst, and investors from Google, Salesforce, and JP Morgan Chase, seeks a Founding Engineer. The role involves developing AI-powered data infrastructure and requires skills in Python, Go, ML/NLP, and cloud technologies. Founded in 2023, Trellis offers opportunities in cutting-edge AI and data projects.
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
txtai: Open-source vector search and RAG for minimalists
txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.
Trellis (YC W24) is hiring engineer to build AI-powered ETL for unstructured data
Trellis, a startup backed by Y Combinator, seeks a Founding Engineer for backend and ML infrastructure. They aim to create an AI-powered Snowflake for unstructured data, offering opportunities in pioneering AI, data infrastructure, and database development.
Trellis (YC W24) is hiring eng to build AI workflows for unstructured data
Trellis, a Y Combinator-backed startup, seeks a founding engineer for its machine learning team, offering a salary of $110K-$225K and equity. Candidates need experience in full-stack development and relevant technologies.