Pipe Syntax in SQL
GoogleSQL enhances SQL by introducing a piped data flow syntax, improving usability and learning while maintaining compatibility with existing systems, allowing for gradual adoption of new features.
Read original articleSQL, while widely used as the standard language for data manipulation, has inherent design flaws that complicate its learning, usage, and extensibility. Despite numerous attempts to create alternative languages, none have successfully replaced SQL due to the challenges of adoption. In response, GoogleSQL has introduced a novel solution by extending SQL rather than replacing it. This extension incorporates a piped data flow syntax, which aligns with successful patterns found in other modern data languages. The implementation of this syntax transforms SQL into a more flexible and user-friendly language, facilitating easier learning and usage while maintaining compatibility with the existing SQL ecosystem. This approach allows users to gradually adopt new features without the need for extensive migrations or the steep learning curve associated with entirely new languages, ultimately enhancing productivity in data management tasks.
- SQL has significant design problems that hinder its usability and extensibility.
- Alternative languages have failed to displace SQL due to adoption challenges.
- GoogleSQL extends SQL by adding piped data flow syntax.
- The new syntax makes SQL easier to learn and use while preserving its existing ecosystem.
- Incremental adoption of new features is possible without major migrations or learning new languages.
Related
At 50 Years Old, Is SQL Becoming a Niche Skill?
SQL, a foundational technology, faces scrutiny in today's IT world. Evolving roles like data scientists challenge its centrality. Debates persist on SQL's relevance against newer technologies like JSON queries, impacting its future role.
After a deacade of using SQL it "clicked" for me
Kyle Benzle reflects on understanding SQL after years, comparing it to a tool for structured data management. He emphasizes SQL as a program for data manipulation, highlighting key commands for efficient database management.
I Get SQL, Kind Of
Kyle Benzle had a breakthrough understanding of SQL, likening it to a tool for managing data like a digital filing cabinet. Recognizing SQL as a program simplifies its complexity and aids in grasping its role in programming and data management.
SQL at 50: What's next for the structured query language?
SQL, celebrating its 50th anniversary, remains a vital data management tool, ranking third among programming languages, with its adaptability and mathematical foundation ensuring continued relevance despite emerging technologies.
SQL Iceberg
The article presents a SQL cheatsheet covering essential commands and advanced techniques, suitable for all skill levels, with practical examples to enhance understanding of database management and data analysis.
I hope SQL pipes become commonplace so that I can use them anywhere I have to write SQL queries.
FROM r JOIN s USING (id)
|> WHERE r.c < 15
|> AGGREGATE sum(r.e) AS s GROUP BY r.d
|> WHERE s > 3
|> ORDER BY d
|> SELECT d, s, rank() OVER (order by d)
Can we call this SQL anymore after this? This re-ordering of things has been done by others too, like PRQL, but they didn't call it SQL. I do think it makes things more readable.https://static.simonwillison.net/static/2024/Pipe-Syntax-In-...
https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-s...
How about we get a SQL successor with algebraic data types, true boolean logic (as opposed to SQL's ternary logic), or functional composition? Null values are the bane of any query writer's existence, and we should have a reasonable solution by now...we've already done it with other programming languages.
It definitely makes things easier to follow, but only for linear, ie. single table, transformations. The moment joins of multiple tables come into the picture things become hairy quick and then you actually start to appreciate the plain old sql which accounts for exactly this and allows you to specify column aliases in the entire cte clause. With this piping you lose scope of the table aliases and then you have to use weird hacks like mangling names of the joined in table in polars.
For single table processing the pipes are nice though. Especially eliminating the need for multiple different keywords for filter based on the order of execution (where, having, qualify (and pre-join filter which is missing)).
A missed opportunity here is the redundant [AGGREGATE sum(x) GROUP BY y]. Unless you need to specify rollups, [AGGREGATE y, sum(x)] is a sufficient syntax for group bys and duckdb folks got it right in the relational api.
There have been so many attempts to alleviate the difficulty of writing SQL -- from ORMs, to alternate syntaxes, to alternate databases -- and none have been successful. The authors identify many reasons for this. The approach the authors have taken is incremental and makes only slight demands one people already familiar with elementary SQL -- and even people familiar with advanced SQL will likely find this approach to be easier to use for advanced queries.
* Joins can be modelled as a "cross-referencing" operation that consume two (or more) data streams and produce a single data stream
* CTEs can be modelled as producing multiple data streams
* Recursive CTEs can be modelled as cycles in the execution graph
Also, why the need to match semantic evaluation order when there are far more important things happening under the hood that affect the execution plan (indexes, etc.)?
> Side-effects at a distance...The same columns are cross-referenced in SELECT, GROUP BY and ORDER BY, and corresponding edits are often required in three places
Can't this be avoided by using aliases?
...okay, if I dial-back my feelings of resignation to mediocrity, then I'll admit that Google probably does have enough clout to make this go somewhere - but they'd need to add this to all their database offerings (BigQuery, Spanner, Firebase's SQL mode) and contribute patches to Postgres and MySQL/Maria - maybe after Microsoft relents a decade later to add it to MSSQL we'll maybe start to see Oracle's people refer to it vaguely as a nice-to-have they'll implement only after they start losing more blue-chip customers[1].
Also, it's giving me M (Excel PowerQuery) vibes too.
-------
[1]For context, Oracle's DB lacked a `bit`/`bool` column type for the past 40 years until last year. People had to use `char(1)` columns with CHECK constraints to store '0'/'1' - or worse: 'T'/'F' or 'Y'/'N' (see https://stackoverflow.com/a/3726846/159145 )
It's difficult to learn and use? Compared to what? The "difficult to extend" might be accurate, but I really question the initial premise here.
These days, we have tooling that can generate an IR for SQL and transpile it to any dialect (see: sqlglot).
Doesn’t coupling SQL syntax to dialect-specific SQL semantics throw a wrench in this?
Has anybody figured out whether they are proposing this pipeline syntax to be mixed with regular compositional forms like CTEs and subqueries? Or is it another limited, top-level syntax, similar to how some DB engines do not allow nesting of CTE syntax?
Why not write simple SQL queries and use another language to do the transformations?
Are SQL engines really more efficient at filtering/matching/aggregating data when doing complex queries? Doesn't working without reusable blocks / tests / logs make development harder?
Syntax is one thing, but actual performance (and safety/maintenance) is another deal?
Some developers have problems, because they don't understand its abstraction. Periodically they try to re-implement feature to "fix" SQL. Every time it's an absolute disaster.
I realize this is how the world works, but it's just a waste of time.
Doesn't look compelling for the lockin.
I vaguely remember someone telling me that it has mathematical underpinnings.
The inclusion of pipe syntax or data processing using pipe-structured data flow syntax is not very novel at all. Splunk's SPL and similar languages have been a longstanding industry practice.
I wish the paper had provided more extensive references to the existing body of work in this domain, as it would have added depth and context to the discussion.
FROM r JOIN s on r.Id = s.Id ORDER BY r.Id SELECT *
But the thing is, such changes will break pretty much all existing code, so the author added |> to distinguish, but why not use | instead? Don't make people typing one more character please.
Related
At 50 Years Old, Is SQL Becoming a Niche Skill?
SQL, a foundational technology, faces scrutiny in today's IT world. Evolving roles like data scientists challenge its centrality. Debates persist on SQL's relevance against newer technologies like JSON queries, impacting its future role.
After a deacade of using SQL it "clicked" for me
Kyle Benzle reflects on understanding SQL after years, comparing it to a tool for structured data management. He emphasizes SQL as a program for data manipulation, highlighting key commands for efficient database management.
I Get SQL, Kind Of
Kyle Benzle had a breakthrough understanding of SQL, likening it to a tool for managing data like a digital filing cabinet. Recognizing SQL as a program simplifies its complexity and aids in grasping its role in programming and data management.
SQL at 50: What's next for the structured query language?
SQL, celebrating its 50th anniversary, remains a vital data management tool, ranking third among programming languages, with its adaptability and mathematical foundation ensuring continued relevance despite emerging technologies.
SQL Iceberg
The article presents a SQL cheatsheet covering essential commands and advanced techniques, suitable for all skill levels, with practical examples to enhance understanding of database management and data analysis.