August 15th, 2024

Stripe's Monorepo Developer Environment

Stripe's developer environment evolved from 2012 to 2019, emphasizing stability through cloud-based "devboxes," automated code synchronization, and infrastructure supporting rapid testing, requiring ongoing investment for effectiveness.

Read original articleLink Icon
Stripe's Monorepo Developer Environment

The article reflects on the developer environment at Stripe, particularly during the author's tenure from 2012 to 2019. It highlights the evolution of Stripe's monorepo, primarily written in Ruby, and the significant role of the developer productivity team in enhancing the developer experience. The author emphasizes the importance of a stable and reliable environment, which was achieved through dedicated tooling and a focus on centralized support. The development environment utilized cloud-based instances, known as "devboxes," allowing developers to run code remotely while keeping source code on their local machines. This setup minimized configuration issues and facilitated easier debugging. The article also discusses the synchronization of code changes from local machines to devboxes, which was streamlined through an automated sync script. Additionally, the infrastructure supported HTTP services, enabling developers to test changes quickly via stable URLs. The author acknowledges that while the environment was effective, it required ongoing investment and adaptation to meet the evolving needs of the engineering teams.

- Stripe's developer environment evolved significantly from 2012 to 2019, focusing on stability and reliability.

- The use of cloud-based "devboxes" allowed for centralized management of development environments.

- Code synchronization between local machines and devboxes was automated to reduce manual intervention.

- The infrastructure supported rapid testing of changes through stable URLs for HTTP services.

- Ongoing investment in tooling and team resources was crucial for maintaining an effective developer experience.

Related

A dev's thoughts on developer productivity (2022)

A dev's thoughts on developer productivity (2022)

The article delves into developer productivity, emphasizing understanding code creation, "developer hertz" for iteration frequency, flow state impact, team dynamics, and scaling challenges. It advocates for nuanced productivity approaches valuing creativity.

DevOps: The Funeral

DevOps: The Funeral

The article explores Devops' evolution, emphasizing reproducibility in system administration. It critiques mislabeling cloud sysadmins as Devops practitioners and questions the industry's shift towards new approaches like Platform Engineering. It warns against neglecting automation and reproducibility principles.

Bad habits that stop engineering teams from high-performance

Bad habits that stop engineering teams from high-performance

Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.

Automate Project Environments with Devbox and Direnv

Automate Project Environments with Devbox and Direnv

The article emphasizes the benefits of isolated project environments and introduces Devbox and Direnv as tools to automate environment management. It explains their features, integration, and simplification of setting up project environments.

I'm Back, Ruby on Rails

I'm Back, Ruby on Rails

The author reassesses Ruby on Rails, praising its stability, built-in features, and supportive community, while highlighting its advantages for rapid development and deployment, making it suitable for startups.

Link Icon 29 comments
By @p-o - 8 months
It's always so enlightening to have articles like this one shed light on how companies at scale operate. It goes without saying that many of the problems Stripe faced with their monorepo isn't application to smaller businesses, but there are still bits and pieces that are applicable to many of us.

I've been working on an ephemeral/preview environment operator for Kubernetes(https://github.com/pier-oliviert/sequencer) and as I could agree to a lot of things OP said.

I think dev boxes is really the way to go, specially with all the components that makes an application nowadays. But the latency/synchronization issue is a hard topic and it's full of tradeoff.

A developer's laptop always ends up being a bespoke environment (yes, Nix/Docker can help with that), and so, there's always a confidence boost when you get your changes up on a standalone environment. It gives you the proof that "hey things are working like I expected them to".

By @mootoday - 8 months
I've worked with remote dev environments for many years, including some time with one of the providers of such a service.

It became clear to me that cloud-only is not the way to go, but instead a local-first, cloud-optional approach.

https://mootoday.com/blog/dev-environments-in-the-cloud-are-...

By @aidos - 8 months
Maybe a silly question, but why all this engineering effort when you could host the dev environment locally?

By running a Linux VM on your local machine you get a consistent environment that you can ssh to, remove the latency issues but you remove all the complexity of syncing that they’ve created.

That’s a setup that’s worked well for me for 15 years but maybe I’m missing some other benefit?

By @domenkozar - 8 months
We've been building https://devenv.sh for that reason, I expect more companies to go back to local development once they see DX has improved locally.
By @physicsguy - 8 months
I think for smaller companies, you can get a long way towards a lot of this with judicious use of docker-compose, and convenience scripts in a Makefile. As long as you don't do anything stupid like try and spin up 100 services when you're a team of 8, most laptops these days are sufficiently capable of handling a database, Redis, your codebase, and something like LocalStack.
By @delhanty - 8 months
>Some caveats: It’s been nearly five years, and I have no doubt that I have misremembered some of the specific details, even though I’m confident in the overall picture. I’m also certain that Stripe has continued evolving and I make no claim this document represents the developer experience at Stripe as of today.

Are there any more recently ex-Stripe folks here willing and able to comment on how Stripe's developer environment might have evolved since the OP left in 2019?

By @mleo - 8 months
I use syncthing to manage the synchronization of files between local laptop and remote development server. The software code base is upwards of 20 years and has dependencies on Windows for runtime. I can run unit tests locally on very fast MacBook Pro or run it much slower on Windows VM. With syncthing I can easily edit files locally or remotely and they are available locally for source control.

The worst problem is refining the ignore settings to ensure only code is synced preventing conflicts on derivative files and that some rule doesn’t overlap code file names.

By @ronef - 8 months
I love this. I believe I might have even interfaced with your team around that time. I was leading Facebook's (now Meta) Developer Products team and we were building against super similar areas internally.

We ran back then a similar project that I coined "Developer On-Demand" to tackle that same problem space. It's also what eventually lead me to find the magics of Nix and then build Flox.

I also agree with a lot of what was shared in other comments, while the problems we tackled at large orgs such as Facebook, Shopify, Uber, Google (to name a few teams I remember working with) and obviously also Stripe, certain areas of the pain are 100% universal regardless of team size.

On the Flox side, we're trying to help with a few of them today and many more hopefully in the soon future, very open for thoughts! Things like - simple to use Nix for each of your projects + keep deps and config up to date across everyones Macbooks and Linux boxes, etc -- even if you don't have a full AWS team and Language Server team ready to support.

By @anonzzzies - 8 months
We use similar practices in our 3.5 person team; we work via code-server and Aider with our own tooling on VPSs and this gets synced to execution VPSs which run dev versions, a lot of sentry logging and tests (mostly playwright these days). There is also a vps which does builds all day and logs to Sentry too. We can almost instantly get on our own test versions and see what we did, and, over the space of some seconds to minutes we see test and build data coming in. It works incredibly well for many years already. Onboarding people is easy and no one ever has 'it doesn't build on my system' as that's not something we do (you can of course, all scripts are there but why waste the time?).

I grew up with mainframes, minis and unix batch andor multiuser machines; for me this is the best way for business applications. I didn't particularly like the move to local all that much.

By @reillys - 8 months
I chatted to Nelson when I was designing brisk (https://github.com/brisktest/brisk) and his insight informed the development of it.

Among other things, Brisk allows you to run tests for your local code changes in the cloud (basically the pay mini test piece but for any test runner)

We also have a sync step much like the one described here and allow users to run one off commands (linters, tsc etc)

By @codethief - 8 months
Very insightful blog post!

> Finally: the development experience, of course, is only part of the story: the full lifecycle of code and features continues onward into CI and code review and ultimately through deployment into production, where it will be further observed, debugged, and evolved. Writing about those systems would require further posts at least this long.

In case the author is around: I would love to read those!

By @Aeolun - 8 months
They decided to keep the code on the local machine, but the language server on the remote one. That seems like a recipe for inconsistency. You only get relevant results from your language server once your code has synced.
By @adamdecaf - 8 months
We’ve been using a hundred repositories and a hundred Go services in a local docker-compose setup that’s worked fairly well. CI runners can struggle if their disks can’t keep up with Docker.

It comes up that we should make a devprod for front end folks to make the backend abstracted more.

Overall a lot of people prefer local dev because it gives them access to the entire stack, lets them run branch images easier, and has better performance than remote boxes.

https://moov.io/blog/education/moovs-approach-to-setup-and-t...

By @bool3max - 8 months
Off-topic but the font on this blog is stunning - after some digging it seems to be "Vollkorn".
By @KolmogorovComp - 8 months
> In addition, Stripe’s monorepo was (to our knowledge) the largest Ruby codebase in existence

Bigger than shoppify's?

By @crabbone - 8 months
NB. What the article describes isn't a developer environment in the cloud. It's testing in the cloud. The editor in their model lives on the programmers' laptops, the editing happens there as well and so on. The code is deployed to cloud infrastructure for testing.
By @truetraveller - 8 months
"I’ve described a lot of fairly-involved custom tooling; we needed enough engineers to build and maintain it, and enough “customer” engineers for that investment to pay off."

This is so important when deciding to re-invent the wheel. I've gotten bitten by this many times.

By @prasoonds - 8 months
I wonder if there’s a devbox-as-a-service tool out there. I use a MacBook Air for most of my work and on occasion would be benefited by using a beefier machine in the cloud. I just don’t want to set up a machine, set up sync etc.
By @secondcoming - 8 months
What's the easiest way of sharing things like protobuf definitions across multiple separate repos and making sure things are always in sync?
By @nivertech - 8 months
This post is more about syncing between local and remote dev environments than about monorepos.
By @jdtig - 8 months
Does Stripe use RoR?

The author mentions the codebase was Ruby, but I didn't see if they talked about Rails.

By @stealthybox - 8 months
This is an awesome writeup of the tools and culture issues you run into maintaining dev environments.

From post, the problems that justified central dev boxes are roughly: 1. dependency / config mgmt / env drift on laptops 2. collaboration / debugging between engineers 3. compute scaling + optimization 4. supporting devs with updates and infra changes

The last one is particularly interesting to me, because supporting the dev env is separate engineering role/task that starts small and grows into teams of engineers supporting the environment.

I'm helping build Flox. We're working on these pain points by making environments (deps, vars, services, and builds) workable across all kinds of Mac/Linux laptops and servers. 1) a. Virtualize the pkg manager per-project b. Nix packages can install across OS/arch pretty well 2) Imperative actions like `flox install`/`upgrade` always edit a declarative env manifest.toml -- share it via git 3) less Docker VM's -- get more out of devteam Macbooks 4) reduce toil with a versioned, shareable envs --> less sending ad-hoc config and brew commands to people (as mentioned in the post.) Just `git pull && flox activate`.

I think on problem point #2, collab tools are advancing to where, pairing on features, bugs, and env issues can be done without central SSH. (ex: tmate, vscode liveshare, screensharing, etc) -- however, that does sort of fall apart on laptops for async debugging of env issues (ex: when devprod is in the US, and eng is in London). Having universal telemetry on ephemeral cloud dev-boxes with a registry and all of the other DNS and SSH goodies could be the kind of infra to aspire to as your small teams run into more big-team problems.

In the Stripe anecdote, adopting the centralized infra created new challenges that their devprod teams were dedicated to supporting: - international latency from central, US-based VM's - syncing code to the dev boxes (https://facebook.github.io/watchman/) - linting, formatting, generating configs (run it locally or serverside?) - a dev workflow CLI tool dedicated to dev-box workflows and sync'ing with watchman's clock - IaaS, registry, config, glue for all the servers

This is all very non-trivial work, but maybe there's a future where people can win some portability with Flox when they are small and grow into those new challenges when it's truly needed -- now their laptop environments just get a quick `flox activate` on some new, shiny servers or Cloud IDE's.

I really like the notes from the author on how useing Language Server Protocol across a high latency link has great optimizations that work along side the watchman sync for real-time code editing.

By @pjmlp - 8 months
Yet another replay of timesharing development experiences, I guess we need a couple of generations more to count how many times does a pendulum swing back and forth during a developer's lifetime.
By @srvaroa - 8 months
"This scale – the scale of devprod, and in turn the scale of the overall organization, such that it could afford 10 FTEs on tooling – was a major factor in our choices"

Is basically the summary for most mono/multi repo discussions, and a bunch of other related ones.

By @vfclists - 8 months
How does a payment service wind up with over a 1000 engineers?

I understand that "engineers" may not mean "developers", it could DevOps, site reliability and all the bits and pieces that make up a large service provider, but over a 1000?

Can someone please enlighten me?

By @rvz - 8 months
This isn't recommended practice really and there is nothing about this which justifies having to maintain huge code bases in a single folder or multiple folders in one larger one.

Won't be surprised to see that many would probably need a safari map or README documentation in every single folder to navigate a repository as large as stripes.

Sounds like an emergence of a new bad practice if you are having to praise how large your code base is.