Thoughts on Canonical S-Expressions (2019)
Canonical S-Expressions (csexp) efficiently handle binary data without base64 encoding but lack associative array support, complicating complex data serialization. Alternatives like Bencoding and MessagePack may offer better solutions.
Read original articleCanonical S-Expressions (csexp) are a data format used by Datashards, characterized by their efficiency in handling binary data. Unlike traditional S-Expressions, csexp represents every atom as a byte object, allowing for compact storage without the need for base64 encoding, which is common in formats like JSON. This format is flexible and easy to parse, making it straightforward to implement a reader for csexp. However, it lacks support for associative arrays, which complicates the serialization and deserialization of more complex data structures. Users must create their own methods for handling such structures, which can lead to ambiguity when interpreting the data. While csexp avoids the overhead of XML and does not impose type conversions, it still requires a reader to convert parsed data into application-specific formats. The author suggests that incorporating type hints could simplify the reader's task, similar to JSON-LD. Alternatives like Bencoding, MessagePack, and Preserves offer varying degrees of expressiveness and ease of use, with Bencoding providing type support. Overall, csexp is a suitable choice for straightforward applications, but its limitations may necessitate a reevaluation for more complex data needs.
- Canonical S-Expressions are efficient for binary data storage without base64 encoding.
- They lack support for associative arrays, complicating complex data handling.
- A reader is necessary to convert parsed data into application-specific formats.
- Incorporating type hints could enhance usability and clarity.
- Alternatives like Bencoding and MessagePack may offer additional benefits for complex data structures.
Related
Compile-time JSON deserialization in C++
This article explores compile-time JSON deserialization in C++. It discusses static reflection, pattern matching, template specialization, and constexpr functions to parse JSON data into atomic and compound types, ensuring type safety and flexibility.
Version Tolerant Serialization in C++
Version-tolerant serialization in C++ is simplified with a binary serialization framework. It allows easy serialization of complex structures like A and B, supporting field renaming and handling added/deleted fields efficiently. Limited to C++ with no cross-language/platform support.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
"Maxwell's equations of software" examined
Ken Shirriff's blog post analyzes a historic Lisp code snippet, showcasing Lisp's core principles. It highlights code-data interchangeability and the essence of Lisp programming, referencing Alan Kay's "Maxwell's Equations of Software."
Why CSV is still king
CSV remains a dominant file format in data processing due to its simplicity and widespread adoption, despite challenges like lack of standardization and text encoding issues. Its relevance continues.
Beyond lists and string atoms (or whatever the actual list is), this format also makes an affordance for custom types, but as TFA points out, you still have to roll your own other / higher order data types. Data types that you almost definitely have on hand. Now we are talking about needing to do additional processing on the decoded output, just to interpret common data structures like associative arrays and sets. And as a machine-first serialization format, if you are interchanging with other people or with yourself in the future, sure hope you have full agreement on those custom types.
So what do you do: Add libs? Roll your own? Well, competing alternatives already offer that complete picture as mature, battle-tested solutions. So I'm inclined to view Canonical S-Expressions merely as a way-point on our path of technological evolution, worthy of fleeting, mild curiosity.
* It can be used as schema-less
* allows attaching metadata tags to values (which can serve as type hints[1]), and
* encodes blobs efficiently
I have not used it, but in the space of flexible formats it appears to have other interesting properties. For instance it can encode a symbol table making symbols really compact in the rest of the message. Symbol tables can be shared out of band.
[1] https://amazon-ion.github.io/ion-docs/docs/spec.html#annot
Canonical S-Expression: (9:groceries(4:milk5:bread))
Bencoding: l9:groceriesl4:milk5:breadee
Bencoding also manages to specify dictionaries, and yet still have a canonical encoding, by requiring dictionaries be sorted by key (and keys be unique).It doesn't have the option for arbitrary type names, it just has actual types: integer, bytestring, list and dictionary.
FTA:
> Bencoding offers many of the same benefits of CSEXP, but because it also supports types, is a bit easier to work with.
Hmm, well there you go.
This is exactly what edn does. Seems like the author would like edn but doesn’t mention it
Readers may want to look at both of course!
Huh, I'd have said it should become the string "100", based on earlier examples such as 5:hello
Related
Compile-time JSON deserialization in C++
This article explores compile-time JSON deserialization in C++. It discusses static reflection, pattern matching, template specialization, and constexpr functions to parse JSON data into atomic and compound types, ensuring type safety and flexibility.
Version Tolerant Serialization in C++
Version-tolerant serialization in C++ is simplified with a binary serialization framework. It allows easy serialization of complex structures like A and B, supporting field renaming and handling added/deleted fields efficiently. Limited to C++ with no cross-language/platform support.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
"Maxwell's equations of software" examined
Ken Shirriff's blog post analyzes a historic Lisp code snippet, showcasing Lisp's core principles. It highlights code-data interchangeability and the essence of Lisp programming, referencing Alan Kay's "Maxwell's Equations of Software."
Why CSV is still king
CSV remains a dominant file format in data processing due to its simplicity and widespread adoption, despite challenges like lack of standardization and text encoding issues. Its relevance continues.