July 22nd, 2024

Parse, Don't Validate

The article explores type-driven design in programming, emphasizing "Parse, don’t validate" in Haskell. It showcases using types for robust code, avoiding errors, and enhancing input parsing efficiency in various tasks.

Read original article

AdviceDiscussionAppreciation

The article discusses the concept of type-driven design in programming, particularly focusing on the idea of "Parse, don’t validate." The author explains how this approach can be applied in Haskell to ensure more robust and reliable code. By using types to enforce constraints at compile time, developers can avoid common pitfalls like partial functions and unnecessary checks for empty values. The article illustrates this concept through examples of functions like head, showing how refining types can lead to more concise and safer code. By leveraging the type system effectively, developers can shift from merely validating inputs to parsing them, preserving valuable information and reducing the likelihood of errors. The article emphasizes the power of parsing functions in handling input validation upfront and highlights how this approach is widely used in various Haskell libraries for tasks like JSON parsing, command-line argument parsing, database value parsing, and more.

Optimizing the Roc parser/compiler with data-oriented design

The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.

A reckless introduction to Hindley-Milner type inference (2019)

Hindley-Milner type inference balances expressiveness and legibility in programming languages like Elm and Haskell. It enhances correctness by enforcing strict type checking, limiting coding practices for improved type safety.

Common Interface Mistakes in Go

The article delves into interface mistakes in Go programming, stressing understanding of behavior-driven, concise interfaces. It warns against excessive, non-specific interfaces and offers guidance from industry experts for improvement.

I Probably Hate Writing Code in Your Favorite Language

The author critiques popular programming languages like Python and Java, favoring Elixir and Haskell for immutability and functional programming benefits. They emphasize personal language preferences for hobby projects, not sparking conflict.

Evolving Languages Faster with Type Tailoring

Programming languages face limitations in understanding domain-specific aspects like regular expressions, causing errors. "Type Tailoring" proposes teaching type systems new tricks through metaprogramming tools for improved code efficiency and correctness.

AI: What people are saying

The article on type-driven design in programming, particularly "Parse, don’t validate" in Haskell, has sparked a diverse discussion.

Many commenters appreciate the advice and find it applicable across different programming paradigms, including OO and dynamic languages like TypeScript and Clojure.
Some highlight the importance of strong type systems to reduce bugs and ensure robust code, referencing concepts like "Making Impossible States Impossible."
There are concerns about the practical application of the principle, especially in contexts like API security and handling invalid data.
Several comments discuss the nuances of parsing and validation, emphasizing that they are not mutually exclusive and often overlap in practice.
Examples and references to related literature and tools, such as Design by Contract and TypeScript's type predicates, are provided to support the discussion.

33 comments

By @dgb23 - 3 months

This is very good advice and a great article. It comes up on this site now and then because of it.

For those who don't necessarily program in statically typed functional languages:

The idea transcends paradigms.

You'll find very similar notions in 80's/90's OO literature, for example in Design by Contract. I'm sure one can dig deeper and find papers, discussions and specifications that go further back.

I think TypeScript is often written in such a way where you refine the types at runtime. I assume Design by Contract has influenced Clojure's spec (Clojure is a dynamic language).

Fundamentally this is about assumptions and guarantees (or requiring and providing). Once an assumption is checked and guarantees can be made, then other parts of the program don't need to check overlapping assumptions again.

In fact I think one of the most confusing things when you read code is seeing already guaranteed properties being checked again somewhere else. It makes code harder to reason about and improve.

By @PreInternet01 - 3 months

(2019), but still good-ish advice. The pattern works like a charm in modern C# as well, and has nice space-saving effects too by allowing you to omit the explicit variable declaration:

    if(!Whatever.TryParse<Thingy>(input, out var output)) output = some-sane-default;

or:

    if(!Whatever.TryParse<Thingy>(input, out var output)) throw new ApplicationException($"Not a valid Thingy: {input}");

Protip: don't do the latter in your kernel-mode driver.

By @WiSaGaN - 3 months

Utilize strong type system to make the error case unrepresentable. This is great advice to reduce bugs in software in general. It takes more time to think about the problem and to make a design following this. However, a lot of times it is worth the time.

By @kgeist - 3 months

>Now I have a single, snappy slogan that encapsulates what type-driven design means to me, and better yet, it’s only three words long:

>Parse, don’t validate.

For me the slogan is rather "always validate only in the single constructor" (or constructor function, doesn't matter). That way, you cannot have invalid objects at all, and there's always a single source of truth. If you want to modify the object, implement it via constructing a new state by calling the same constructor again.

By @vladssw - 3 months

Related: "Making Impossible States Impossible" by Richard Feldman

https://www.youtube.com/watch?v=IcgmSRJHu_8

By @blowski - 3 months

Some good previous discussions on this:

https://news.ycombinator.com/item?id=35053118

https://news.ycombinator.com/item?id=21476261

By @maw - 3 months

Whenever this comes up, I'm reminded of section 5 in https://cr.yp.to/qmail/guarantee.html which among other things says "Don't parse" and "there are two types of command interfaces in the world of computing: good interfaces and user interfaces".

If I were were to teach a class about programming in the medium (as opposed to in the small or in the large), I think I'd assign my students an essay comparing and contrasting these suggestions. Each has something to teach us, and maybe they're not as contradictory as it may seem at first.

By @teeheelol - 3 months

Forwarded to crowdstrike.

By @hintymad - 3 months

This reminds me of a comment someone made during the craze of XMLs in the mid 2000s. In the comment the author suspected that so many organizations chose XML to implement their domain-specific languages, configuration languages included, only because XML offers a parser, while most organizations didn't want to bother with writing their own parser.

It beats me why people didn't want to write parsers, though. Writing parsers is not that hard, and is quite fun.

By @yakshaving_jgt - 3 months

One of my favourite articles published during my career. I've noticed that people often just read the title and assume parsing and validation are somehow mutually exclusive, but in practice that's not the case. Parsing often includes validation. This is addressed in the article, under Use abstract datatypes to make validators “look like” parsers.

It's the same kind of ground as avoiding primitive obsession.

By @zigzag312 - 3 months

Is this opposite to the following opinion?

> "“required” keyword in Protocol Buffers turned out to be a horrible mistake"

https://capnproto.org/faq.html#how-do-i-make-a-field-require...

Having both flexible, unvalidated parsing and validated parsing functions would probably be best IMHO.

By @knallfrosch - 3 months

Great advice and works well in TypeScript as well. See https://www.typescriptlang.org/docs/handbook/advanced-types....

More than merely catching compiler errors, this also makes writing code faster by offering better auto-completion for new code.

By @valenterry - 3 months

> Is it possible to implement foo? Trivially, the answer is no, as Void is a type that contains no values, so it’s impossible for any function to produce a value of type Void

That's actually not really correct. Or rather, it is technically correct but it will confuse the readers who work in languages like Java.

While void in languages like Java means that the result of the function cannot be used or has no meaning, it is NOT equivalent to types like the bottom type of Haskell. Because that would mean that the function can never return.

Rather, void is similar to the "unit type" (https://en.wikipedia.org/wiki/Unit_type) which does have a value. It's like an empty tuple. It contains no information other then "the function call has finished". (and of course in languages with exceptions, this means that no exception was thrown)

Otherwise, I like the article. More people should read and understand this way of thinking.

By @Garlef - 3 months

Hm... I quite like the idea but I think the initial example is not very good and also the remark about 'shotgun parsing' seems to blut some levels:

The key idea seems to be that the border between the periphery/plumbing/deserialization code and the actual business logic should be as strict and direct and isolated as possible. Only pass objects/data/payloads to the business logic that have been fully ingested into the data model of the business logic. And keep the ingestion in one place.

From this perspective, the section about "shotgun parsing" might give some people the wrong idea and derail some discussions: If it's an actual part of the business requirements that branching and validations need to happen (branching for example over the existence of an optional value), a superficial reading of the article might lead someone to incorrectly identify this as "shotgun parsing".

By @brunooliv - 3 months

This post is bookmarked for me since this first time I read it and I occasionally come back to it, it’s a great one

By @hatsuseno - 3 months

I feel like this idea is another form of, or at least related to, my own habit to process input in two phases, plan and execute. Run the input through a planner component that produces a sequential list of instruction that would 'do' whatever it is we're doing. I've changed this style to permit parallel execution or other more complicated structures than just a flat list, but I often come back to this base plan. Invalid input would be caught during the planning phase, and I need to ensure the planner can't make impossible plans up to an extent. Can't say I've avoided every type of problem or bug this way, but it sure as hell gives me a good base to work with.

By @andrewghull - 3 months

Here's the final example in TypeScript if you find that easier to read:

  type NonEmpty<T> = [T, ...T[]]

  const head = <T>(list: NonEmpty<T>) => list[0]

  function getConfigurationDirectories(): NonEmpty<string> {
    const configDirsString = process.env["CONFIG_DIRS"]
    const [firstDir, ...restDirs] = configDirsString.split(',')
    if (firstDir === undefined) throw Error("CONFIG_DIRS cannot be empty");
    return [firstDir, ...restDirs];
  }

  function main() {
    const configDirs = getConfigurationDirectories();
    initializeCache(head(configDirs))
  }

By @keybored - 3 months

One of my favorite things that I’ve read via HN.

It seems like this approach will often bottom out in smart constructors since type systems either are limited or make you work too hard to prove relatively simple thing.

By @swyx - 3 months

fwiw, Lexi works at Hasura, where the practical application of this principle is called the "PDV refactor" https://x.com/tanmaigo/status/1291710315223306243

(from my own notes https://x.com/swyx/status/1548380295765733378)

By @TacticalCoder - 3 months

What is needed are canonical representation of data and then you must parse, re-encode, and verify that the re-encoded data matches bit for bit what was parsed.

Not in unit tests but when the app is running: you take the data in, you parse it, you re-encode/re-serialize it/re-whatever it. If it's not matching the data that came in, the data that came in is rejected.

And that should just be one of the steps taken to verify that the data looks legit.

By @wormlord - 3 months

This idea + Domain modeling go together like peanut butter and chocolate. The idea that an invalid state in your Domain is unrepresentable takes so much work off the database and API, and makes programs so easy to test-- since an invalid Domain representation simply will not parse, and can never make it to your API or DB, taking so much logic out of the parts of a software stack that are hardest to test.

By @the_gipsy - 3 months

Sadly this doesn't work at all in go with the zero-values concept.

By @willsmith72 - 3 months

this reminds me of a team i worked in which had very few unit tests, and compensated with in-depth complex type systems

comprehensive test coverage meets the same goal, and TDD looks a lot like "type-driven design", only easier to read and maintain.

By @sriram_malhar - 3 months

Can someone steeped in type theory explain the following:

I never understood why the default for [a] means that the list could be empty. If ...

    foo: [a] -> a

... foo is supposed to get a list of a's, it should get a list of a's, with at least one a. If the list can be empty, then explicitly annotate it so:

   foo: [a*] -> a

One way or the other one has to deal with the empty list explicitly (in the signature). If you allow empty lists, it will have to return a 'Maybe a'. It seems to me that it just makes processing the result easier in the common case if the input were to be constrained.

By @kayo_20211030 - 3 months

Is Postel's law relevant here?

By @Vosporos - 3 months

A seminal text that has had a high cultural impact.

By @fragmede - 3 months

(2019)

By @zendist - 3 months

It seems that CloudStrike only parsed and didn't validate, to great effect :-) /s

Not saying that this advice isn't solid, just thought it's funny given the news of this week.

By @mindesc - 3 months

yeah. use raw datatype url provided by the language and get hacked by some exotic xss you are not aware, because the specs have kitchen and sink included

By @madduci - 3 months

I don't know why this is a good advice at all.

If you have exposed APIs, you should prevent malicious payloads and what happens when the parser can be broken through invalid data, causing also Out of Memory exceptions?

It might work only if you have some safe guardrails around the APIs, but just exposing naked endpoints, without a minimum of checks or a Web Application Firewall, this isn't a real good advice