Parse, Don't Validate
The article explores type-driven design in programming, emphasizing "Parse, don’t validate" in Haskell. It showcases using types for robust code, avoiding errors, and enhancing input parsing efficiency in various tasks.
Read original articleThe article discusses the concept of type-driven design in programming, particularly focusing on the idea of "Parse, don’t validate." The author explains how this approach can be applied in Haskell to ensure more robust and reliable code. By using types to enforce constraints at compile time, developers can avoid common pitfalls like partial functions and unnecessary checks for empty values. The article illustrates this concept through examples of functions like head, showing how refining types can lead to more concise and safer code. By leveraging the type system effectively, developers can shift from merely validating inputs to parsing them, preserving valuable information and reducing the likelihood of errors. The article emphasizes the power of parsing functions in handling input validation upfront and highlights how this approach is widely used in various Haskell libraries for tasks like JSON parsing, command-line argument parsing, database value parsing, and more.
Related
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.
A reckless introduction to Hindley-Milner type inference (2019)
Hindley-Milner type inference balances expressiveness and legibility in programming languages like Elm and Haskell. It enhances correctness by enforcing strict type checking, limiting coding practices for improved type safety.
Common Interface Mistakes in Go
The article delves into interface mistakes in Go programming, stressing understanding of behavior-driven, concise interfaces. It warns against excessive, non-specific interfaces and offers guidance from industry experts for improvement.
I Probably Hate Writing Code in Your Favorite Language
The author critiques popular programming languages like Python and Java, favoring Elixir and Haskell for immutability and functional programming benefits. They emphasize personal language preferences for hobby projects, not sparking conflict.
Evolving Languages Faster with Type Tailoring
Programming languages face limitations in understanding domain-specific aspects like regular expressions, causing errors. "Type Tailoring" proposes teaching type systems new tricks through metaprogramming tools for improved code efficiency and correctness.
- Many commenters appreciate the advice and find it applicable across different programming paradigms, including OO and dynamic languages like TypeScript and Clojure.
- Some highlight the importance of strong type systems to reduce bugs and ensure robust code, referencing concepts like "Making Impossible States Impossible."
- There are concerns about the practical application of the principle, especially in contexts like API security and handling invalid data.
- Several comments discuss the nuances of parsing and validation, emphasizing that they are not mutually exclusive and often overlap in practice.
- Examples and references to related literature and tools, such as Design by Contract and TypeScript's type predicates, are provided to support the discussion.
For those who don't necessarily program in statically typed functional languages:
The idea transcends paradigms.
You'll find very similar notions in 80's/90's OO literature, for example in Design by Contract. I'm sure one can dig deeper and find papers, discussions and specifications that go further back.
I think TypeScript is often written in such a way where you refine the types at runtime. I assume Design by Contract has influenced Clojure's spec (Clojure is a dynamic language).
Fundamentally this is about assumptions and guarantees (or requiring and providing). Once an assumption is checked and guarantees can be made, then other parts of the program don't need to check overlapping assumptions again.
In fact I think one of the most confusing things when you read code is seeing already guaranteed properties being checked again somewhere else. It makes code harder to reason about and improve.
if(!Whatever.TryParse<Thingy>(input, out var output)) output = some-sane-default;
or: if(!Whatever.TryParse<Thingy>(input, out var output)) throw new ApplicationException($"Not a valid Thingy: {input}");
Protip: don't do the latter in your kernel-mode driver.>Parse, don’t validate.
For me the slogan is rather "always validate only in the single constructor" (or constructor function, doesn't matter). That way, you cannot have invalid objects at all, and there's always a single source of truth. If you want to modify the object, implement it via constructing a new state by calling the same constructor again.
If I were were to teach a class about programming in the medium (as opposed to in the small or in the large), I think I'd assign my students an essay comparing and contrasting these suggestions. Each has something to teach us, and maybe they're not as contradictory as it may seem at first.
It beats me why people didn't want to write parsers, though. Writing parsers is not that hard, and is quite fun.
It's the same kind of ground as avoiding primitive obsession.
> "“required” keyword in Protocol Buffers turned out to be a horrible mistake"
https://capnproto.org/faq.html#how-do-i-make-a-field-require...
Having both flexible, unvalidated parsing and validated parsing functions would probably be best IMHO.
More than merely catching compiler errors, this also makes writing code faster by offering better auto-completion for new code.
That's actually not really correct. Or rather, it is technically correct but it will confuse the readers who work in languages like Java.
While void in languages like Java means that the result of the function cannot be used or has no meaning, it is NOT equivalent to types like the bottom type of Haskell. Because that would mean that the function can never return.
Rather, void is similar to the "unit type" (https://en.wikipedia.org/wiki/Unit_type) which does have a value. It's like an empty tuple. It contains no information other then "the function call has finished". (and of course in languages with exceptions, this means that no exception was thrown)
Otherwise, I like the article. More people should read and understand this way of thinking.
The key idea seems to be that the border between the periphery/plumbing/deserialization code and the actual business logic should be as strict and direct and isolated as possible. Only pass objects/data/payloads to the business logic that have been fully ingested into the data model of the business logic. And keep the ingestion in one place.
From this perspective, the section about "shotgun parsing" might give some people the wrong idea and derail some discussions: If it's an actual part of the business requirements that branching and validations need to happen (branching for example over the existence of an optional value), a superficial reading of the article might lead someone to incorrectly identify this as "shotgun parsing".
type NonEmpty<T> = [T, ...T[]]
const head = <T>(list: NonEmpty<T>) => list[0]
function getConfigurationDirectories(): NonEmpty<string> {
const configDirsString = process.env["CONFIG_DIRS"]
const [firstDir, ...restDirs] = configDirsString.split(',')
if (firstDir === undefined) throw Error("CONFIG_DIRS cannot be empty");
return [firstDir, ...restDirs];
}
function main() {
const configDirs = getConfigurationDirectories();
initializeCache(head(configDirs))
}
It seems like this approach will often bottom out in smart constructors since type systems either are limited or make you work too hard to prove relatively simple thing.
(from my own notes https://x.com/swyx/status/1548380295765733378)
Not in unit tests but when the app is running: you take the data in, you parse it, you re-encode/re-serialize it/re-whatever it. If it's not matching the data that came in, the data that came in is rejected.
And that should just be one of the steps taken to verify that the data looks legit.
comprehensive test coverage meets the same goal, and TDD looks a lot like "type-driven design", only easier to read and maintain.
I never understood why the default for [a] means that the list could be empty. If ...
foo: [a] -> a
... foo is supposed to get a list of a's, it should get a list of a's, with at least one a. If the list can be empty, then explicitly annotate it so: foo: [a*] -> a
One way or the other one has to deal with the empty list explicitly (in the signature). If you allow empty lists, it will have to return a 'Maybe a'. It seems to me that it just makes processing the result easier in the common case if the input were to be constrained.Not saying that this advice isn't solid, just thought it's funny given the news of this week.
If you have exposed APIs, you should prevent malicious payloads and what happens when the parser can be broken through invalid data, causing also Out of Memory exceptions?
It might work only if you have some safe guardrails around the APIs, but just exposing naked endpoints, without a minimum of checks or a Web Application Firewall, this isn't a real good advice
Related
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.
A reckless introduction to Hindley-Milner type inference (2019)
Hindley-Milner type inference balances expressiveness and legibility in programming languages like Elm and Haskell. It enhances correctness by enforcing strict type checking, limiting coding practices for improved type safety.
Common Interface Mistakes in Go
The article delves into interface mistakes in Go programming, stressing understanding of behavior-driven, concise interfaces. It warns against excessive, non-specific interfaces and offers guidance from industry experts for improvement.
I Probably Hate Writing Code in Your Favorite Language
The author critiques popular programming languages like Python and Java, favoring Elixir and Haskell for immutability and functional programming benefits. They emphasize personal language preferences for hobby projects, not sparking conflict.
Evolving Languages Faster with Type Tailoring
Programming languages face limitations in understanding domain-specific aspects like regular expressions, causing errors. "Type Tailoring" proposes teaching type systems new tricks through metaprogramming tools for improved code efficiency and correctness.