November 22nd, 2024

Runtime-Extensible SQL Parsers Using Peg

Traditional SQL parsers are outdated and inflexible, while modern PEG parsers enable dynamic syntax changes and better error handling. A prototype demonstrates efficiency, emphasizing the need for updated parser technology.

Read original articleLink Icon
Runtime-Extensible SQL Parsers Using Peg

The article discusses the limitations of traditional SQL parsers in data management systems, which often rely on outdated technologies like YACC. These parsers are inflexible and hinder innovation due to their monolithic design. The authors advocate for a shift towards modern Parsing Expression Grammars (PEG), which allow for dynamic changes to query syntax and improved error recovery. PEG parsers can be reconfigured at runtime, enabling the integration of new syntax and features without the need for extensive recompilation. This flexibility is particularly beneficial as SQL specifications evolve and new query languages emerge. The authors present a prototype PEG parser that successfully parses SQL queries and demonstrate its extensibility through experiments. Although the PEG parser shows a performance slowdown compared to traditional methods, the absolute parsing times remain efficient for analytical queries. The article emphasizes the need for modernizing parser infrastructure to enhance user experience and support the growing complexity of SQL dialects. The findings will be presented at the 2025 Conference on Innovative Data Systems Research (CIDR).

- Traditional SQL parsers are outdated and inflexible, limiting innovation.

- Modern PEG parsers allow for dynamic syntax changes and better error handling.

- A prototype PEG parser has been developed, demonstrating extensibility and efficiency.

- Despite some performance trade-offs, PEG parsers maintain acceptable parsing times for analytical queries.

- The research highlights the importance of updating parser technology in data management systems.

Link Icon 5 comments
By @xrd - 5 months
The incredible Janet for Mortals book by Ian Henry was my first exposure to peg. It's very interesting and changed my thinking on programming in a big way. It's a free book.

https://janet.guide/pegular-expressions/

By @EuAndreh - 5 months
There is nothing wrong with using PEGs for SQL parsing, but this article (I didn't read the paper) presents flawed arguments:

- tech $X is from the 60s, therefore it is bad and/or outdated: one doesn't need to "disrupt" or innovate in everything to become modern. There are plenty of things from the 60s that still don't have a better replacement, and its OK to keep using it.

- "YACC-style parsers" clumps together parsers that are generated at compile-time, from declarative grammars, using LALR(1). But that's not inherit to the technique or algorithm: a parser can be LALR(1) from a declarative grammar and still extensible at run-time, or provide LL(1) alongside, or be built from statements instead of a grammar. There's nothing wrong with using PEGs over "YACC-style" parsers, but not for these distorted reasons.

By @lovasoa - 5 months
From a practical standpoint, for anyone who needs to parse SQL today, I can recommend datafusion's sqlparser-rs. This is what we use in http://sql-page.com , and I regularly contribute to it. I don't know anything else that matches its level of support for all the crazy little-known syntax particularities of the various SQL dialects.

In particular, Microsoft SQL Server seems to do everything just a little bit differently, and sqlparser-rs does support its idiosyncrasies most of the time.

By @kristianp - 5 months
> parsers should be rewritten using modern abstractions like Parser Expression Grammars (PEG), which allow dynamic changes to the accepted query syntax and better error recovery.
By @gigatexal - 5 months
The DuckDb project continues to impress me every day.