Stripe's Monorepo Developer Environment
Stripe's developer environment evolved from 2012 to 2019, emphasizing stability through cloud-based "devboxes," automated code synchronization, and infrastructure supporting rapid testing, requiring ongoing investment for effectiveness.
Read original articleThe article reflects on the developer environment at Stripe, particularly during the author's tenure from 2012 to 2019. It highlights the evolution of Stripe's monorepo, primarily written in Ruby, and the significant role of the developer productivity team in enhancing the developer experience. The author emphasizes the importance of a stable and reliable environment, which was achieved through dedicated tooling and a focus on centralized support. The development environment utilized cloud-based instances, known as "devboxes," allowing developers to run code remotely while keeping source code on their local machines. This setup minimized configuration issues and facilitated easier debugging. The article also discusses the synchronization of code changes from local machines to devboxes, which was streamlined through an automated sync script. Additionally, the infrastructure supported HTTP services, enabling developers to test changes quickly via stable URLs. The author acknowledges that while the environment was effective, it required ongoing investment and adaptation to meet the evolving needs of the engineering teams.
- Stripe's developer environment evolved significantly from 2012 to 2019, focusing on stability and reliability.
- The use of cloud-based "devboxes" allowed for centralized management of development environments.
- Code synchronization between local machines and devboxes was automated to reduce manual intervention.
- The infrastructure supported rapid testing of changes through stable URLs for HTTP services.
- Ongoing investment in tooling and team resources was crucial for maintaining an effective developer experience.
Related
A dev's thoughts on developer productivity (2022)
The article delves into developer productivity, emphasizing understanding code creation, "developer hertz" for iteration frequency, flow state impact, team dynamics, and scaling challenges. It advocates for nuanced productivity approaches valuing creativity.
DevOps: The Funeral
The article explores Devops' evolution, emphasizing reproducibility in system administration. It critiques mislabeling cloud sysadmins as Devops practitioners and questions the industry's shift towards new approaches like Platform Engineering. It warns against neglecting automation and reproducibility principles.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
Automate Project Environments with Devbox and Direnv
The article emphasizes the benefits of isolated project environments and introduces Devbox and Direnv as tools to automate environment management. It explains their features, integration, and simplification of setting up project environments.
I'm Back, Ruby on Rails
The author reassesses Ruby on Rails, praising its stability, built-in features, and supportive community, while highlighting its advantages for rapid development and deployment, making it suitable for startups.
I've been working on an ephemeral/preview environment operator for Kubernetes(https://github.com/pier-oliviert/sequencer) and as I could agree to a lot of things OP said.
I think dev boxes is really the way to go, specially with all the components that makes an application nowadays. But the latency/synchronization issue is a hard topic and it's full of tradeoff.
A developer's laptop always ends up being a bespoke environment (yes, Nix/Docker can help with that), and so, there's always a confidence boost when you get your changes up on a standalone environment. It gives you the proof that "hey things are working like I expected them to".
It became clear to me that cloud-only is not the way to go, but instead a local-first, cloud-optional approach.
https://mootoday.com/blog/dev-environments-in-the-cloud-are-...
By running a Linux VM on your local machine you get a consistent environment that you can ssh to, remove the latency issues but you remove all the complexity of syncing that they’ve created.
That’s a setup that’s worked well for me for 15 years but maybe I’m missing some other benefit?
Are there any more recently ex-Stripe folks here willing and able to comment on how Stripe's developer environment might have evolved since the OP left in 2019?
The worst problem is refining the ignore settings to ensure only code is synced preventing conflicts on derivative files and that some rule doesn’t overlap code file names.
We ran back then a similar project that I coined "Developer On-Demand" to tackle that same problem space. It's also what eventually lead me to find the magics of Nix and then build Flox.
I also agree with a lot of what was shared in other comments, while the problems we tackled at large orgs such as Facebook, Shopify, Uber, Google (to name a few teams I remember working with) and obviously also Stripe, certain areas of the pain are 100% universal regardless of team size.
On the Flox side, we're trying to help with a few of them today and many more hopefully in the soon future, very open for thoughts! Things like - simple to use Nix for each of your projects + keep deps and config up to date across everyones Macbooks and Linux boxes, etc -- even if you don't have a full AWS team and Language Server team ready to support.
I grew up with mainframes, minis and unix batch andor multiuser machines; for me this is the best way for business applications. I didn't particularly like the move to local all that much.
Among other things, Brisk allows you to run tests for your local code changes in the cloud (basically the pay mini test piece but for any test runner)
We also have a sync step much like the one described here and allow users to run one off commands (linters, tsc etc)
> Finally: the development experience, of course, is only part of the story: the full lifecycle of code and features continues onward into CI and code review and ultimately through deployment into production, where it will be further observed, debugged, and evolved. Writing about those systems would require further posts at least this long.
In case the author is around: I would love to read those!
It comes up that we should make a devprod for front end folks to make the backend abstracted more.
Overall a lot of people prefer local dev because it gives them access to the entire stack, lets them run branch images easier, and has better performance than remote boxes.
https://moov.io/blog/education/moovs-approach-to-setup-and-t...
Bigger than shoppify's?
This is so important when deciding to re-invent the wheel. I've gotten bitten by this many times.
The author mentions the codebase was Ruby, but I didn't see if they talked about Rails.
From post, the problems that justified central dev boxes are roughly: 1. dependency / config mgmt / env drift on laptops 2. collaboration / debugging between engineers 3. compute scaling + optimization 4. supporting devs with updates and infra changes
The last one is particularly interesting to me, because supporting the dev env is separate engineering role/task that starts small and grows into teams of engineers supporting the environment.
I'm helping build Flox. We're working on these pain points by making environments (deps, vars, services, and builds) workable across all kinds of Mac/Linux laptops and servers. 1) a. Virtualize the pkg manager per-project b. Nix packages can install across OS/arch pretty well 2) Imperative actions like `flox install`/`upgrade` always edit a declarative env manifest.toml -- share it via git 3) less Docker VM's -- get more out of devteam Macbooks 4) reduce toil with a versioned, shareable envs --> less sending ad-hoc config and brew commands to people (as mentioned in the post.) Just `git pull && flox activate`.
I think on problem point #2, collab tools are advancing to where, pairing on features, bugs, and env issues can be done without central SSH. (ex: tmate, vscode liveshare, screensharing, etc) -- however, that does sort of fall apart on laptops for async debugging of env issues (ex: when devprod is in the US, and eng is in London). Having universal telemetry on ephemeral cloud dev-boxes with a registry and all of the other DNS and SSH goodies could be the kind of infra to aspire to as your small teams run into more big-team problems.
In the Stripe anecdote, adopting the centralized infra created new challenges that their devprod teams were dedicated to supporting: - international latency from central, US-based VM's - syncing code to the dev boxes (https://facebook.github.io/watchman/) - linting, formatting, generating configs (run it locally or serverside?) - a dev workflow CLI tool dedicated to dev-box workflows and sync'ing with watchman's clock - IaaS, registry, config, glue for all the servers
This is all very non-trivial work, but maybe there's a future where people can win some portability with Flox when they are small and grow into those new challenges when it's truly needed -- now their laptop environments just get a quick `flox activate` on some new, shiny servers or Cloud IDE's.
I really like the notes from the author on how useing Language Server Protocol across a high latency link has great optimizations that work along side the watchman sync for real-time code editing.
Is basically the summary for most mono/multi repo discussions, and a bunch of other related ones.
I understand that "engineers" may not mean "developers", it could DevOps, site reliability and all the bits and pieces that make up a large service provider, but over a 1000?
Can someone please enlighten me?
Won't be surprised to see that many would probably need a safari map or README documentation in every single folder to navigate a repository as large as stripes.
Sounds like an emergence of a new bad practice if you are having to praise how large your code base is.
Related
A dev's thoughts on developer productivity (2022)
The article delves into developer productivity, emphasizing understanding code creation, "developer hertz" for iteration frequency, flow state impact, team dynamics, and scaling challenges. It advocates for nuanced productivity approaches valuing creativity.
DevOps: The Funeral
The article explores Devops' evolution, emphasizing reproducibility in system administration. It critiques mislabeling cloud sysadmins as Devops practitioners and questions the industry's shift towards new approaches like Platform Engineering. It warns against neglecting automation and reproducibility principles.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
Automate Project Environments with Devbox and Direnv
The article emphasizes the benefits of isolated project environments and introduces Devbox and Direnv as tools to automate environment management. It explains their features, integration, and simplification of setting up project environments.
I'm Back, Ruby on Rails
The author reassesses Ruby on Rails, praising its stability, built-in features, and supportive community, while highlighting its advantages for rapid development and deployment, making it suitable for startups.