Why I love Rust for tokenising and parsing
The author develops sqleibniz, a Rust-based static analysis tool for SQL, focusing on syntax checks and validation. Future plans include creating a Language Server Protocol server for SQL.
Read original articleThe blog post discusses the author's experience using Rust for developing a static analysis tool for SQL, named sqleibniz, specifically targeting the SQLite dialect. The tool aims to perform syntax checks and validate the existence of tables, columns, and functions in SQL input. The author highlights the importance of creating a tokenizer and parser for SQL, emphasizing Rust's features that facilitate code deduplication through macros. The post details the implementation of an Abstract Syntax Tree (AST) and the use of traits to define nodes, which helps in managing code complexity. The author also shares insights on testing strategies, particularly the use of table-driven tests, which are adapted from Go to Rust. The testing framework allows for organized and efficient validation of the lexer and parser functionalities, ensuring that both valid and invalid SQL inputs are handled correctly. The author expresses enthusiasm for Rust's capabilities in building robust software and hints at future developments, including the creation of a Language Server Protocol (LSP) server for SQL.
- The author is developing a static analysis tool for SQL called sqleibniz using Rust.
- The tool focuses on syntax checks and validating SQL constructs against SQLite documentation.
- Rust's macros are utilized for code deduplication in defining AST nodes and traits.
- The author adapts table-driven testing from Go to Rust for validating lexer and parser functionalities.
- Future plans include developing a Language Server Protocol (LSP) server for SQL.
Related
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Rust's Ugly Syntax (2023)
The blog post addresses complaints about Rust's syntax, attributing them to misunderstandings of its semantics. It suggests simplifying semantics for readability while maintaining performance and safety features.
Build a quick Local code intelligence using Ollama with Rust
Bosun developed Swiftide, a Rust-based tool for efficient code indexing and querying, utilizing Qdrant and FastEmbed. It enhances performance with OpenTelemetry, integrating various language models for improved response times.
I love Rust for tokenising and parsing
The author develops a Rust-based static analysis tool for SQL, sqleibniz, focusing on syntax checks and validation, utilizing macros for code efficiency, and plans to implement an LSP server.
Why I love Rust for tokenising and parsing
The author develops a Rust-based static analysis tool for SQL, named sqleibniz, focusing on syntax checks and validation for SQLite, emphasizing error message quality and employing table-driven tests.
With that realisation I started looking for another more suitable language - I knew the FP aspects of Rust are what I was looking for so at first I considered something like F# but I didn't like that it's tied to microsoft/.NET. Looking a bit further I could have gone with something like Zig/C but then I lose the FP niceness I'm looking for. I also spent a fair amount of time looking at Go, but eventually decided that 1. I wanted a fair amount of syntax sugar, and 2. golang is a server side language, a lot of its features and library are geared towards this use case.
Finally I found OCaml, what really convinced me was seeing the syntax was like a friendly version of Haskell, or like Rust without lifetimes. In fact the first Rust compiler was written in OCaml, and OCaml is well known in the programming language space. I'm still learning OCaml so I'm not sure I can give a fair review yet, but so far it's exactly what I was looking for.
A few notes:
* The AST would, I believe, be much simpler defined as an algebraic data types. It's not like the sqlite grammar is going to randomly grow new nodes that requires the extensibility their convoluted encoding requires. The encoding they uses looks like what someone familiar with OO, but not algebraic data types, would come up with.
* "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
* The work on parser combinators would be a good place to start to see how to structure parsing in a clean way.
[0] https://github.com/ryandv/chesskell/blob/master/src/Chess/Fa...
[1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...
Again: not throwing shade. I think this is a place where Rust is genuinely quite strong.
In fact, that eBNF only produces the lexer. The parser part is not that impressive either, 120 LoC and quite repetitive https://github.com/gritzko/librdx/blob/master/JSON.c
So, I believe, a parser infrastructure evolves till it only needs eBNF to make a parser. That is the saturation point.
Educational and elegant approach.
I can take my parser combinator library that I use for high-level compiler parsers, and use that same library in a no-std setting and compile it to a micro-controller, and deploy that as a high-performance protocol parser in an embedded environment. Exact same library! Just with fewer String and more &'static str.
So toying around with compilers translates my skill-set rather well into doing embedded protocol parsers.
Curious what the rest of the prior art looks like
I’m imagining seeing the node! macro used, and seeing the macro definition, but still having a tough time knowing exactly what code is produced.
Do I just use the Example and see what type hints I get from it? Can I hover over it in my IDE and see an expanded version? Do I need to reference the compiled code to be sure?
(I do all my work in JS/TS so I don’t touch any macros; just curious about the workflow here!)
The railroad diagrams are tremendously useful:
https://www.sqlite.org/syntaxdiagrams.html
I don't think the lemon parser generator gets enough credit:
https://sqlite.org/src/doc/trunk/doc/lemon.html
With respect of the choice of the language, any language with Algebraic Data Types would work great. Even Typescript would be great for this.
FWIW I wrote a small introduction to writing parsers by hand in Rust a while ago:
https://www.nhatcher.com/post/a-rustic-invitation-to-parsing...
I do still like declarative parsing over imperative, so I wrote https://docs.rs/inpt on top of the regex crate. But Andrew Gallant gets all the credit, the regex crate is overpowered.
this time it got traction. funny how HN works.
Me: "How can a programming language be so damn complex? Am I just dumb?"
Related
First Contact with SQLite
The article explores surprising aspects of SQLite, like limitations in altering columns and data types. It notes the recent jsonb support and handling date/time values, praising SQLite's streaming features but favoring PostgreSQL.
Rust's Ugly Syntax (2023)
The blog post addresses complaints about Rust's syntax, attributing them to misunderstandings of its semantics. It suggests simplifying semantics for readability while maintaining performance and safety features.
Build a quick Local code intelligence using Ollama with Rust
Bosun developed Swiftide, a Rust-based tool for efficient code indexing and querying, utilizing Qdrant and FastEmbed. It enhances performance with OpenTelemetry, integrating various language models for improved response times.
I love Rust for tokenising and parsing
The author develops a Rust-based static analysis tool for SQL, sqleibniz, focusing on syntax checks and validation, utilizing macros for code efficiency, and plans to implement an LSP server.
Why I love Rust for tokenising and parsing
The author develops a Rust-based static analysis tool for SQL, named sqleibniz, focusing on syntax checks and validation for SQLite, emphasizing error message quality and employing table-driven tests.