August 3rd, 2024

Handwritten Text Recognition for Xournal++ Using Deep Learning

The Xournal++ HTR project enhances the Xournal++ app with handwritten text recognition, prioritizing user privacy. It supports community contributions and future improvements in HTR performance through various development strategies.

Read original article

Handwritten Text Recognition for Xournal++ Using Deep Learning

The Xournal++ HTR project aims to enhance the Xournal++ note-taking application by integrating handwritten text recognition (HTR) capabilities, allowing users to make their handwritten notes searchable. This open-source initiative emphasizes user privacy by processing data locally. The project employs a Lua plugin for Xournal++ alongside a Python backend for the recognition process.

To set up the project, users need to create a Conda environment and install necessary dependencies, following specific installation instructions. The design of the project supports both stable production features and experimentation with new algorithms, featuring a backend that can accommodate various recognition models. Future developments will focus on improving HTR performance through model retraining, data augmentation, and the incorporation of language models.

The repository encourages community contributions, utilizing a branching strategy to ensure stability in the master branch while allowing for ongoing development in feature branches. Acknowledgments are given to individuals and institutions that have contributed to the project's progress. For further information, the Xournal++ HTR GitHub repository is available for access.

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

The GitHub project "Groqnotes" is a streamlit app utilizing Groq, Whisper, and Llama3 to create structured notes from audio content efficiently. It offers rapid transcription, markdown styling, and download options. Access online or set up locally.

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

The GitHub project "june" combines Ollama, Hugging Face Transformers, and Coqui TTS Toolkit for a private voice chatbot on local machines. It includes setup, usage, customization details, and FAQs. Contact for help.

Transcribro: On-device Accurate Speech-to-text

The GitHub repository for "Transcribro" offers project details, downloads, community links, contribution guidelines, donations, branding guidelines, and keyboard UI screenshots. Contact for project-specific support or inquiries.

Audapolis: Edit audio files by word, not waveform

The Audapolis project on GitHub offers a tailored editor for spoken-word media with audio-to-text transcription. It supports various media types, works on Windows, Linux, and macOS, and stores data locally. Funding sources include governmental and foundation support.

Show HN: Zerox – document OCR with GPT-mini

Zerox OCR is a tool on GitHub for Optical Character Recognition (OCR) in AI applications. It offers functionality, pricing comparisons, installation guidance, and usage examples. Users can explore its features and seek support.

6 comments

By @Qwertious - 9 months

This is great news, it's been needed for ages - handwriting is more than just funky OCR, it's OCR as applied to vector lines with a defined stroke order. So for instance, a lowercase e and c might render to the exact same pixels due to the 'loop' of the e overlapping itself, but if we know the stroke started in the middle of the line and then retreads itself, we can know for sure we're looking at an 'e'. That's simply not possible in e.g. Tesseract.

By @eulgro - 9 months

I just learned about Xournal++, I had been using Xournal which apparently stopped being developed in 2016. I just tried and it's much more complete.

By @kkfx - 9 months

A very nice project but... How many really want to scan to text handwritten text? Results will be messy anyway and typically today handwritten text is not more than few pages, far quicker to retype or even dictate than correcting OCR.

BTW personally I use Xournal++ to add text/images to pdfs, typically where I have some crappy low importance pdf-form, not a real one, and I do not want to invest time in a nice LaTeX + cart.el (Emacs artist mode wrapper to get coordinate of any form clicking with the mouse on them [1]). I still have to do with some scanned documents but originally printed from a computer not handwritten.

Handwritten text recognition might be very welcome to scan and index old public archives, witch is damn complex since there are countless of style of cursive, but it's still a very needed thing to merge the old paper world to the digital one not to loose history.

[1] https://github.com/Nidish96/cart.el

By @minimalist - 9 months

How cool! I have over a decade of notes taken in xournal and other digital tablets and was considering taking a short sabbatical to type them all out. Might not need to after all! I will definitely try this.

By @bzmrgonz - 9 months

1-to-10, how ready for prime time is it?? (10=production ready)

By @millimacro - 8 months

Cheers everyone for your lovely engagement!! <3

Handwritten Text Recognition for Xournal++ Using Deep Learning

Related

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

Transcribro: On-device Accurate Speech-to-text

Audapolis: Edit audio files by word, not waveform

Show HN: Zerox – document OCR with GPT-mini

Related

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

Transcribro: On-device Accurate Speech-to-text

Audapolis: Edit audio files by word, not waveform

Show HN: Zerox – document OCR with GPT-mini