June 27th, 2024

Show HN: Gosax – A high-performance SAX XML parser for Go

The `gosax` Go library enables efficient XML SAX parsing with read-only features, high performance, SWAR optimizations, and `encoding/xml` compatibility. Installation via `go get` and contributions on GitHub are encouraged.

Read original articleLink Icon
Show HN: Gosax – A high-performance SAX XML parser for Go

The `gosax` Go library is designed for efficient and memory-conscious XML SAX parsing in Go. It focuses on read-only functionality and high performance, utilizing SWAR optimizations for efficient parsing. The library is compatible with `encoding/xml` and offers utility functions for seamless integration with existing code. To install `gosax`, you can use the `go get` command with the repository link. Contributions to the library are encouraged through pull requests on the GitHub repository. For further details, acknowledgements, and contact information, you can visit the `gosax` GitHub repository at the provided link. The library is licensed under terms specified in the LICENSE file, making it open for contributions and usage.

Link Icon 12 comments
By @JonChesterfield - 7 months
Very nice, thank you!

Unhelpfully my only pain point with XML parsing is colleagues refusing to use XML in favour of json or, in really grim moments, yaml.

So I'm delighted to see a sensible modern web language implementation of the one true data exchange format. Thank you for sharing it.

By @OnlyMortal - 7 months
Uninteresting fact: I did the code to download TomTom map updates. Mozilla XUL app.

The XML required a good 4GB of RAM to load the model. So… I just read the stream to get to the token I needed and read until the end token.

Obviously, it was faster and required much less memory. The take-away is if you don’t need to parse the model, don’t.

I assume that nowadays, they’re using more sensible format.

By @euroderf - 7 months
Is there any improvement on the deficient namespace handling in the stdlib ?
By @glenjamin - 7 months
Oh nice, I've recently been looking into streaming XML parsing in Go without a CGO depdency and found the available options pretty lacking.

Great to see this sort of thing!

By @runlevel1 - 7 months
Wish I'd had this a few years ago. I had to parse Confluence wiki backups which, for reasons only known to Atlassian and god, lacked any closing tags. I ended up writing something similar to this, but mine was a lot kludgier.
By @38 - 7 months
Little trick with xml.Decoder. unlike unmarshal, decoder ignores any garbage after the XML, which is nice if you want to parse HTML without dealing with the DOM
By @singpolyma3 - 7 months
Does this support DTD/custom entities stuff? I would hope the answer is no, but just checking
By @fsmv - 7 months
What's wrong with the standard library parser?
By @artpar - 7 months
I upvoted you just because I made a golang library with the same name but different purpose

https://github.com/artpar/gosax/

its a High performance golang implementation of Symbolic Aggregate approXimation

By @danesparza - 7 months
This feels ... 20 years too late?

But excellent. Thanks!

By @lanstin - 7 months
Nice. I like the event based/callback based parsing tools for XML a lot. A little more cognitive work up front but much more efficient. A little sad if unsurprised that XML is still a thing in 2024, but if you have to read it, use a streaming parser.