September 2nd, 2024

Tbsp – treesitter-based source processing language

tbsp, a tree-based source-processing language, has recently added lists, index expressions, and string manipulation features, improved documentation, and introduced a command-line interface, following its renaming from "trawk."

Read original articleLink Icon
ExcitementAppreciationCuriosity
Tbsp – treesitter-based source processing language

tbsp is a tree-based source-processing language that has undergone several updates and enhancements recently. Key developments include the addition of lists and index expressions, the implementation of string substring functionality, and improvements to the documentation, including a roadmap and usage examples. The project has also seen a renaming from "trawk" to "tbsp" and the introduction of a command-line interface (CLI) for better usability. The most recent commits reflect ongoing efforts to refine the language and its features, with contributions made by the author Akshay over the past few weeks.

- tbsp is a tree-based source-processing language.

- Recent updates include the addition of lists, index expressions, and string manipulation features.

- The project has been renamed from "trawk" to "tbsp."

- Documentation improvements include a roadmap and usage examples.

- A command-line interface (CLI) has been introduced for enhanced usability.

AI: What people are saying
The introduction of tbsp, a new tree-based source-processing language, has generated positive feedback and suggestions from the community.
  • Users appreciate the new features and improvements, particularly for parsing and processing tasks.
  • There are requests for higher-level APIs and tools to simplify grammar handling and AST processing.
  • Some users express excitement about using tbsp for practical projects, such as converting HTML to CSV.
  • Concerns are raised about the limitations of the Markdown parser and the need for more robust parsing capabilities.
  • The community values the thoughtful naming of the language and its potential for broader applications.
Link Icon 16 comments
By @fellowmartian - 5 months
This is great, and a step in the right direction. I wish tree-sitter had an official higher level API that allowed processing and pattern matching for use cases other than those required for text editors.

I’m currently using tree-sitter at work to build AST-based tools, as performance is amazing, even with huge codebases, but I’m finding it slightly frustrating to have to manually write recursive descent processors keyed by strings, with no compile time guarantees on the structure of the grammar.

This is compounded by the fact that grammars themselves don’t really follow any standard structure, some have named fields (presumably the ones created after GitHub contributed this feature), while others require hierarchical pattern matching.

I wish there existed a tool to consume a grammar and output a rust ADT that we can simply match on. This would at least save me from redundant error handling. I’d build one myself, but I’m that good at rust yet.

By @rtpg - 5 months
So an awk but that knows how to walk structures instead of just lines. Excellent!

I'm a big fan of semgrep letting me query ASTs, this feels like something in a similar space. Down with lines, up with everything being trees!

By @sramam - 5 months
This is so cool.

Question (caveat: first export to treesitter and tools like this): Is there a reason the example demonstrates the use of depth as a variable instead of it being built in?

Nesting level of a particular "type" is general enough that it might be included OOTB. What you want to do with this might be generalizable - for example instead of

```

    enter section {
        depth += 1;
    }
    leave section {
        depth -= 1;
    }

    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }
```

It could simply be:

```

    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }
```

So depth is always of the nested levels of the same node type, but available out of the box. For markdown, it's headings, sections and lists come to mind - but I might be wrong.

In any event, this looks really well thought-out and now to checkout the other tools mentioned in the comments.....

By @mingodad - 5 months
For those that want to explore the grammars listed at https://github.com/tree-sitter/tree-sitter/wiki/List-of-pars... in a more friendly railroad diagram format I made https://mingodad.github.io/plgh/json2ebnf.html that reads the "src/grammar.json" and try it's best to generate an EBNF understood by (IPV6) https://www.bottlecaps.de/rr/ui or (IPV4) https://rr.red-dove.com/ui where we get a nice navigable railroad diagram (see https://github.com/GuntherRademacher/rr for offline usage).
By @MantisShrimp90 - 5 months
As someone writing a neovim plugin using treesitter thank you! Languages like this help leverage treesitter in more interesting ways whereas current apis are still a bit low-level
By @samgriesemer - 5 months
The md-to-html demo is a good one, but worth mentioning that the Markdown parser[1] being used may not be suitable for more complex documents. From the README:

> "...it is not recommended to use this parser where correctness is important. The main goal for this parser is to provide syntactical information for syntax highlighting..."

There's also a separate block-level and inline parser, not sure how `tbsp` handles nested or multi-stage parsing.

[1]: https://github.com/tree-sitter-grammars/tree-sitter-markdown

By @ashkankiani - 5 months
Adding a way to query the path at the current node would let you skip out on doing stuff like keeping track of `in_section`.

I wonder if the `enter|exit ...` syntax might be too limiting but for a lot of stuff it seems nice and easy to reason about. Easier than tree-sitter's own queries.

I think if you really wanted performance and whatnot, you might end up compiling the queries to another target and just reuse them.

I could see myself writing a lua DSL around compiling these kinds of queries `enter/exit` stanzas or an SQL one too.

By @orra - 5 months
Not a technical comment (as cool as this is), but I love the name.

We always say naming things is one of the hard parts of programming. They avoided the default option of something like tawk.

By @toastal - 5 months
Always kudos towards taking a self-hosted-forge approach
By @lumb63 - 5 months
This is really cool! I have a lot of short projects that are essentially “parse out 2 or 3 tags of HTML and convert that to CSV. This will be perfect for that; in the past I’ve done it by hand with vim. Next time I’ll give this a shot.
By @jpgvm - 5 months
By @orjicu98 - 5 months
very interesting paradigm of programmin i would recommend checking out, for inspiration: https://rosettacode.org/wiki/Category:Bracmat and https://www.egison.org/

they define themselves as non linear patter matching pretty niche and unique way to program and i enjoyed playing with thier code

thanks for posting very nice

By @azeirah - 5 months
Awesome! I'd love to see this flourish.
By @vslira - 5 months
That's a lot of work to write lisp without parentheses /j

I joke, really interesting project, props to the team

By @PoppGolfer - 5 months
tablespoon - of course....