Show HN: I Made an Open Source Platform for Structuring Any Unstructured Data
OmniParse transforms unstructured data into structured formats for GenAI applications. It supports various data sources and offers features like table extraction, image processing, audio/video transcription, and web crawling. Explore further on GitHub.
Read original articleOmniParse is a platform designed to transform unstructured data into structured, actionable data suitable for GenAI (LLM) applications. It accommodates various data sources like documents, tables, images, videos, audio files, and web pages. The platform offers functionalities such as table extraction, image extraction with captioning, audio/video transcription, web page crawling, and more. For those interested in exploring OmniParse, the GitHub repository provides access to additional details, installation guidelines, instructions for use, supported data formats, API endpoints, and insights into upcoming developments. Visit the GitHub repository at the provided URL to delve deeper into OmniParse's features and capabilities.
Related
Open Source Python ETL
Amphi is an open-source Python ETL tool for data extraction, preparation, and cleaning. It offers a graphical interface, supports structured and unstructured data, promotes low-code development, and integrates generative AI. Available for public beta testing in JupyterLab.
Show HN: Online OPML editor to manage subscription lists
The OPML Editor on GitHub manages RSS and Atom feeds. Users can add, merge OPML files, and remove duplicates using Svelte and CodeMirror technologies. Licensed under AGPL-3.0. Access at opml.imadij.com.
Show HN: a Rust lib to trigger actions based on your screen activity (with LLMs)
The GitHub project "Screen Pipe" uses Large Language Models to convert screen content into actions. Implemented in Rust + WASM, inspired by `adept.ai`, `rewind.ai`, and `Apple Shortcut`. Open source under MIT license.
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
Open-Source Perplexity – Omniplex
The Omniplex open-source project on GitHub focuses on core functionality, Plugins Development, and Multi-LLM Support. It utilizes TypeScript, React, Redux, Next.js, Firebase, and integrates with services like OpenAI and Firebase. Community contributions are welcomed.
The table name was parsed as part of a column name, and half of the column names were not parsed at all.
Original: https://github.com/adithya-s-k/marker-api/blob/master/data/i...
Parsed: https://github.com/adithya-s-k/marker-api/blob/master/data/i...
2. Is there an option for a more permissive license that isn't GNU for commercial enterprise use?
Thanks!
Related
Open Source Python ETL
Amphi is an open-source Python ETL tool for data extraction, preparation, and cleaning. It offers a graphical interface, supports structured and unstructured data, promotes low-code development, and integrates generative AI. Available for public beta testing in JupyterLab.
Show HN: Online OPML editor to manage subscription lists
The OPML Editor on GitHub manages RSS and Atom feeds. Users can add, merge OPML files, and remove duplicates using Svelte and CodeMirror technologies. Licensed under AGPL-3.0. Access at opml.imadij.com.
Show HN: a Rust lib to trigger actions based on your screen activity (with LLMs)
The GitHub project "Screen Pipe" uses Large Language Models to convert screen content into actions. Implemented in Rust + WASM, inspired by `adept.ai`, `rewind.ai`, and `Apple Shortcut`. Open source under MIT license.
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
Open-Source Perplexity – Omniplex
The Omniplex open-source project on GitHub focuses on core functionality, Plugins Development, and Multi-LLM Support. It utilizes TypeScript, React, Redux, Next.js, Firebase, and integrates with services like OpenAI and Firebase. Community contributions are welcomed.