July 9th, 2024

Docs as code (2017)

Documentation as Code (Docs as Code) aligns documentation creation with coding practices, emphasizing collaboration, integration with development, and tool consistency. Write the Docs community advocates this approach through talks, successful industry examples, recommended books, and an open-source toolchain for effective implementation.

Read original article

Documentation as Code (Docs as Code) advocates for writing documentation using the same tools as code, such as issue trackers, version control, plain text markup, code reviews, and automated tests. This approach fosters collaboration between writers and developers, ensuring better integration with development teams and incentivizing developers to contribute to documentation. Write the Docs community has been actively promoting Docs as Code through talks and conferences since 2015, showcasing successful implementations in various organizations like Google, Rackspace, and Amazon Web Services. The concept is gaining traction in the software industry and writing community, with presentations at different conferences worldwide. Recommended books on the topic include "Docs Like Code" by Anne Gentle and "Modern Technical Writing" by Andrew Etter. The community also offers an open-source toolchain, docToolchain, to implement the Docs as Code approach effectively. This methodology aims to streamline documentation processes, empower documentarians, and enhance the overall quality of technical documentation.

Documentation Driven Development (2022)

Documentation Driven Development (DDD) is proposed as a more effective approach than Test-Driven Development (TDD) for software development. DDD involves starting with documentation to iron out implementation details before coding, helping to address API shifts and scope misunderstandings early on. By documenting requirements and potential future API changes, developers can better plan and refine their code, avoiding costly refactors later. DDD emphasizes the importance of various forms of documentation, including design mockups, API references, and tests, to communicate thoughts and guide development. This method encourages a feedback loop on APIs and work scope, enhancing code quality and project outcomes. DDD aligns with concepts like Behavioral Driven Development (BDD) and Acceptance Test-Driven Development (ATDD), emphasizing user behavior validation and strong communication practices. By incorporating DDD into their workflow, developers can improve collaboration, goal refinement, and code quality, leading to more successful project outcomes.

Software Engineering Practices (2022)

Gergely Orosz sparked a Twitter discussion on software engineering practices. Simon Willison elaborated on key practices in a blog post, emphasizing documentation, test data creation, database migrations, templates, code formatting, environment setup automation, and preview environments. Willison highlights the productivity and quality benefits of investing in these practices and recommends tools like Docker, Gitpod, and Codespaces for implementation.

Git Workflows for API Technical Writers

Technical writers encounter challenges in maintaining API documentation aligned with evolving designs. Git workflows provide version control and collaboration benefits. Strategies like merging copy improvements, code annotations, and OpenAPI overlays enhance documentation quality. Involving writers in API design ensures user-centric content. Storing docs in separate repositories can resolve Git conflicts.

Readability: Google's Temple to Engineering Excellence

Google's strict readability process involves code approval by maintainers and readability mentors, shaping coding standards. Despite criticism, it enhances skills, maintains quality, and fosters global code consistency. A proposed "Readability Lite" aims for mentorship and quality without strict enforcement.

Self Documenting Code Is Bullshit

Klaus Breyer challenges self-documenting code, advocating for external documentation to enhance precision and abstraction. He emphasizes the need for detailed information like variable units and invariants, promoting a balanced approach for code clarity.

29 comments

By @simonw - 10 months

A subset of this idea is a hill I am willing to die on: the documentation for a codebase should live in the same repository as the codebase itself.

I'm talking about API documentation here - for both code-level APIs (how to use these functions and classes) as well as HTTP/JSON/GRPC/etc APIs that the codebase exposes to others.

If you keep the documentation in the same repo as the code you get so many benefits for free:

1. Automatic revision control. If you need to see documentation for a previous version it's right there in the repo history, visible under the release tag.

2. Documentation as part of code review: if a PR updates code but forgets to update the accompanying documentation you can catch that at review time.

3. You can run documentation unit tests - automated tests that check that the documentation at least mentions specific pieces of the code (discovered via introspection). I wrote about that a few years ago and it's been working great for me: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...

4. Most important: your documentation can earn trust. Most documentation is out of date and everyone knows that, which means people default to not trusting documentation. If anyone who looks at the commit log can see that the documentation is being actively maintained alongside the code it documents they are far more likely to learn to trust it.

The exception to this rule for me is user-facing documentation describing how end users should use the features provided by the software. I'd ideally love to keep this in the repo too, but there are rational reasons not to - it might be maintained by the customer support team who may want to work in more of a CMS environment, for example.

By @Therenas - 10 months

This is exactly what we do for the Factorio modding API docs. The docs are embedded inside the codebase, alongside the classes and methods that implement the functionality the docs describe.

So they are written and adjusted as the functionality is implemented, they can be reviwed alongside the code PRs. The CI builds the docs and makes sure there are no issues.

The format is a custom one, which is parsed and converted into JSON for language servers and into the API website. Not sure how you‘d test the docs content, but this parser is tested for sure.

Works great for us in general.

By @bluGill - 10 months

Developers are too close to the code to write effective documentation for it. They will go into great detail about things that nobody else cares about, while skipping important parts because to them it is obvious.

While it is possible to do okay anyway, it only happens if there is effort over time.

I'm convinced that the best thing to do it when someone asks you a question about your APIs the response should be to go (now that you are not so close to the code you can better to this) write the answer that person needs, and have them review it until they understand. You are not allowed to talk to that person except via new documentation, while they can pester you as much as they want until you make the documentation usable. It will still take some rounds, but if nobody is reading the documentation there is no point in writing it either.

By @WillAdams - 10 months

Why not just put forth/use Literate Programming?

https://www-cs-faculty.stanford.edu/~knuth/lp.html

By @K0nserv - 10 months

This is focused on people whose job it is to write documentation, but I think it applies generally. A previous company I worked at moved away from Read The Docs to Confluence and it was terrible. This decision was resisted by much of engineering because we recognised that disconnecting documentation from code would make it worse, it did.

By @gwern - 10 months

An entertaining outcome here is that LLMs may render the docs vs code debate largely moot: as LLM coding capabilities increase and the cost per token plummets, it becomes increasingly possible to simply stop writing code at all, and instead write docs which are 'compiled' each time by a LLM to code which is then compiled normally and the code thrown away. The code can never get out of sync with the docs because it is always generated from the docs, in a way that previous brittle fragile complicated 'generate code from docs' approaches could only vaguely dream of.

To do bug fixes, one simply updates the docs to explain the new behavior and intentions, and perhaps include an example (ie. unit test) or a property. This is then reflected in the new version of the codebase - the codebase as a whole, not simply one function or module. So the global refactoring or rewrites happen automatically, simply from conditioning on the new docs as a whole.

This might sound breathtaking inefficient and expensive, but it's just the next step in the long progression from the raw machine ops to assembler to low-level languages like C or LLVM to high-level languages to docs/specifications... I'm sure at each step, the masters of the lower stage were horrified by the profligacy and waste of just throwing away the lower stage each time and redoing everything from scratch.

By @avg_dev - 10 months

I believe I remember reading that for merging branches to the Postgres project, you need to update the docs too in order to pass code review. A nice way of doing it, I thought. Pg has some great docs that I have been reading for some years.

By @fjni - 10 months

This is such an ignorantly engineering centric perspective.

There is value in the larger organization being able to consume documentation and commenting on it and contributing to it.

There is conceptual value in some of these things, but I find it to be overstated and the downsides entirely ignored.

Most documentation systems have a version history.

And most documentation systems are far easier adopted by people other than engineers.

This is the equivalent of pointing out that figma has x, y, and z benefits and designers are fluent in it, so we should be using that for documentation.

By @RandallBrown - 10 months

One of the main bullet points on the page is automated tests.

How do you write automated tests for documentation? Somehow require that blocks of code have documentation linked to them?

By @natpalmer1776 - 10 months

Personally I'm a fan of writing your first draft of documentation before writing the first line of code.

By @jkaptur - 10 months

This is a really interesting topic, and it has complexity I didn't consider until I became deeply involved in some similar systems.

For example, in code, you can generally use feature flags, A/B testing, etc. to show different things to different people quite flexibly, but (depending on how the documentation is actually published) you might have very different capabilities.

By @scoot - 10 months

MUI has always done this (since 2014), but goes one or two steps further than the bullet list at the beginning of the article. Most significantly, API documentation is generated from the code of the components being documented, so is always accurate and up to date.

https://mui.com

By @Spivak - 10 months

Love the concept, hate the article. The article doesn't actually say anything other than "store your docs in git" which... yeah, obviously. You don't need anyone to tell you that being able go to a snapshot of the docs as they were at the time of the commit/release you're looking at is a powerful feature.

But that's not really treating your docs as code, more like "storing your docs in the same place as your code." A system like Sphinx with autosummary and autodoc where the docs are generated from your code and human-readable details like examples are pulled from the relevant docstrings is very much docs as code. Same with FastAPI's automatic OpenAPI generation and automatic Swagger. Pulling the examples section for your functions directly from your tests, now that's docs as code.

By @igtztorrero - 10 months

Love this approach DBC Doc Before Code, very useful when working with Jr Developer

By @rickydroll - 10 months

For me, this should be the end goal of the AI pair programming. I write documentation for APIs, data structures, etc., hand it to the AI, and it should crank out functional code that meets the requirements spelled out in the documentation.

We are close, but it's not there yet. I will always need to run a validation test against the code and eyeball to ensure it's not insane. But today, it's clear to me that if ChatGPT/copilot doesn't generate correct code quickly and easily from what I wrote, I didn't understand the problem and couldn't express it clearly.

By @kkfx - 10 months

Well...

I agree having a shared doc repo so anyone can commit changes/patches to docs, witch is well... Not much different than what most wikis offer already, and while useful wikis prove that's not enough to have good docs and little to no garbage in them...

But I will NEVER "host my docs" on someone else platform depending from their services (if you host code/docs as a mere repo, GH and alike are just mirror of something most dev have, if you use their features your workflow hardly depend on them) and I also never use MD as my default choice.

By @bllchmbrs - 10 months

I've thought about this problem set for years, I've written many docs and technical books.

Version control is the best for documentation.

But maintaining it is hard - lots of great comments here.

For anyone interested, I'm working on https://hyperlint.com/ (disclaimer: bootstrapped founder). To help automate the toil around documentation.

By @fucalost - 10 months

From my limited interactions with document-intensive sectors (i.e. legal), I think they’re sorely lacking something like this.

When the same document is edited by two separate individuals and diverges, it is a nightmare to reconcile the two.

I truly wish (i.) Microsoft Word was a nicer format for VCS, or (ii.) Markdown was more suitable for “formal” legal texts and specifications — probably in that order (!)

By @xixixao - 10 months

In recomputer[0] I put the docs sources directly next to the relevant implementation, and also tested the examples.

For this to work well (not just like an API reference), the implementation itself had to be structured well.

[0] https://github.com/xixixao/recomputer

By @pavel_lishin - 10 months

The landing page doesn't really explain anything, except a tangential quickstart into Github hosting.

By @batterylow - 10 months

Similar for the PlotAPI docs [1] which are all Jupyter notebooks!

[1] https://plotapi.com/docs/

By @MilStdJunkie - 10 months

The DaC debates grow increasingly grim as the overall employment situation worsens across industries. It's pretty hard to get people to react authentically, rather than see the discussion as an attack on how they do their jobs[0].

I'm going to head all this off at the pass, and say instead that DaC[1] is a technological tool for a limited number of business use cases. It's not a panacea, no more than XML publishing in a CCMS (component content management system) was seen as the Alpha and the Omega (and indeed still is by a whoooooole lot of people). I say this as a heartfelt believer in the DaC approach vs a big heavy XML approach.

Your first question - really, this should always be your first question - is, "how do people do their jobs today?". If you work in a broom factory, and the CAD guy reads word documents, the pubs guys use Framemaker, the reviews are in PDF, and the final delivery is a handful of PDF documents....well, using DaC is going to be a jump.

Now, is that jump worth it? Well, it might be. Your CAD guy might know his way around gitlens, your pubs folks probably have some experience with more complex publishing build systems, and, most important of all, you might have a change tempo that really recommends the faster-moving flows of DaC. If you're going the Asciidoc route, you could even try out some re-use via the `include` and `conditional` directives. But it also could be a disaster, with no one using VCS, no one planning out re-use properly[2], people passing reviews around in whatever format, and PDF builds hand-tooled each time. It's not something you dive into because it's what the cool kids are doing. Some places, maybe even most places legacy industry wise, it's just not going to work. Your task - if your job is consulting about such things - is to be able to read the room real fast, and recognize where it's a good fit, and where you might need to back off and point to a heavier solution.

[0] Big traditional XML publishing systems are also in the crosshairs, as they're quite frankly usuriously expensive, also writer teams have started noticing the annoying tendency of vendors to sell a big CCMS and then - once the content's migrated - completely disappearing, knowing that the costs of migration will keep you paying the bill basically forever.

[1] DaC defined as : lightweight markup (adoc, md, rst, etc), written/reviewed with a general-purpose text editor, where change/review/publish is handled on generic version control (git, hg, svn, etc), and the consumable "documents" are produced as part of a build system.

[2] Which crashes ANY CCMS, regardless of how expensive or how DaC-y it is.

By @benrutter - 10 months

I felt myself agreeing hard with this until I read it!

I thought it was gonna be all about ensuring your api documentation is closely coupled with your code. But it's more about using code tools to write docs.

I'm kinda two ways on it, doesn't it depend on what "docs" actually are? (I couldn't find a definition on the page). Wikipedia is a kind of documentation, but tieing it to version contril tools would massively restrict the number of people contributing and therefore the quality if the docs.

I dunno, maybe I'm missing the point.

By @MilStdJunkie - 10 months

As a pretty die-hard enthusiast for this approach - even for legacy, hard industries - let's take a close look at some of the limitations of this approach.

First, code is formal language, and docs are natural language. That's a lot of jargon; what does it mean? It means that the chunks inside of a piece of code are consistently significant; a method is a method, a function is a function. Chunks in a document are, woo boy, good luck with that one. XML doesn't even have line breaks normalized. Again, no matter what the XML priesthood natters about, it's natural language.

A consequence of this is that the units of change are much, much smaller in a repo of code vs a corpus of documents. This, well, is can be ok, but it also means that a PR in a docs as code arrangement can be frickin' terrifying. What this means, is that you have to have a pretty good handle on controlling the scope of change. Don't branch based on doc revisions, but rather on much more incremental change, like an engineering change order or a ticket number.

Your third problem is that the review format will never - can never - be completely equivalent to the deliverable. The build process will always stand in the way, because doing a full doc build for every read is too much overhead for basically any previewer or system on the planet. This is a hard stop for a lot of DaC adopters, as many crusty managers insist that the review format has to be IDENTICAL to the format as it's delivered. Of course, that means when you use things like CIRs (common information repositories) that you end up reviewing hundreds of thousands of books because an acronym changed....but I call 'em "crusty" for a reason. They're idiots.

By @eysgshsvsvsv - 10 months

Why hoard random sentences. Let go. Your time is more valuable.

By @wruza - 10 months

tl;dr: commit index.md to github repo and use github pages to host it.

By @lkdfjlkdfjlg - 10 months

Is documentation that important? Even when I think is excellent like postgres, I've only ever had a few pages of it. Which leads me to think, who's reading the other thousands (?) of pages?

I think the amount of effort you should put into documentation varies wildly on the scope of the project.

Docs as code (2017)

Related

Documentation Driven Development (2022)

Software Engineering Practices (2022)

Git Workflows for API Technical Writers

Readability: Google's Temple to Engineering Excellence

Self Documenting Code Is Bullshit

Related

Documentation Driven Development (2022)

Software Engineering Practices (2022)

Git Workflows for API Technical Writers

Readability: Google's Temple to Engineering Excellence

Self Documenting Code Is Bullshit