July 2nd, 2024

Show HN: I Made an Open Source Platform for Structuring Any Unstructured Data

OmniParse transforms unstructured data into structured formats for GenAI applications. It supports various data sources and offers features like table extraction, image processing, audio/video transcription, and web crawling. Explore further on GitHub.

Read original articleLink Icon
Show HN: I Made an Open Source Platform for Structuring Any Unstructured Data

OmniParse is a platform designed to transform unstructured data into structured, actionable data suitable for GenAI (LLM) applications. It accommodates various data sources like documents, tables, images, videos, audio files, and web pages. The platform offers functionalities such as table extraction, image extraction with captioning, audio/video transcription, web page crawling, and more. For those interested in exploring OmniParse, the GitHub repository provides access to additional details, installation guidelines, instructions for use, supported data formats, API endpoints, and insights into upcoming developments. Visit the GitHub repository at the provided URL to delve deeper into OmniParse's features and capabilities.

Link Icon 4 comments
By @bpev - 4 months
I'm not sure that I understand what we're parsing to. Like on the website, I see supported types, but that looks like the parsable types, no? What kind of structured representation is outputted? And can we guide what that structure looks like?
By @itishappy - 4 months
I haven't run it myself, but the example provided looks kinda broken. It looks WAY better than the PyPDF results, but good enough?

The table name was parsed as part of a column name, and half of the column names were not parsed at all.

Original: https://github.com/adithya-s-k/marker-api/blob/master/data/i...

Parsed: https://github.com/adithya-s-k/marker-api/blob/master/data/i...

By @brianjking - 4 months
1. How does this differ from LlamaParse which can be used with and without LlamaParse?

2. Is there an option for a more permissive license that isn't GNU for commercial enterprise use?

Thanks!

By @sirjaz - 4 months
What are the limitations of running the server on Windows?