A Jenga tower about to collapse: Software erosion is happening all around us
Software erosion results from complex architectures and frequent changes, with developers spending 42% of time on maintenance. A "shift left" approach is crucial for integrating quality assurance early in development.
Read original articleSoftware erosion is increasingly prevalent in modern software development, characterized by complex architectures that resemble a precarious Jenga tower. Developers spend a significant portion of their time—42%—on maintenance rather than innovation, leading to frequent outages that disrupt services across various sectors. The root cause of these outages is not a lack of testing but rather the hypercomplexity of software configurations, resulting from numerous changes made by different teams over time. This complexity is exacerbated by pressures to innovate quickly, often leading to shortcuts that introduce further complications. As developers patch issues, they inadvertently create a cycle of instability, which can lead to attrition among engineers and a decline in morale. To combat software erosion, companies must adopt a "shift left" approach, integrating quality assurance early in the development process rather than as an afterthought. This involves thorough testing and understanding of the software architecture to prevent costly fixes later on. Ultimately, addressing software erosion requires a commitment to quality and a reevaluation of development practices to ensure sustainable growth and functionality.
- Software erosion is caused by complex architectures and frequent changes.
- Developers spend over 40% of their time on maintenance, limiting innovation.
- Outages are often due to the instability created by shortcuts and patching.
- A "shift left" approach is essential for integrating quality assurance early in development.
- Companies need to understand their software architecture to prevent future issues.
Related
The software world is destroying itself (2018)
The software development industry faces sustainability challenges like application size growth and performance issues. Emphasizing efficient coding, it urges reevaluation of practices for quality improvement and environmental impact reduction.
The Software Crisis
The software crisis, coined in 1968, highlights challenges in managing software complexity. Despite advancements, issues persist, emphasizing responsible construction, user agency, and sustainable development practices through constrained abstractions and user empowerment.
Big Ball of Mud (1999)
The paper delves into the Big Ball of Mud software architecture, analyzing its causes, challenges, and strategies for improvement. It highlights the balance between pragmatism and long-term architectural considerations in software development.
Projects considered harmful – Part 1
Software development projects often prioritize time and budget over quality, leading to compromised dependability. Project managers focus on meeting objectives, neglecting software quality. Reevaluating project management practices is crucial for software dependability.
Ask HN: Business logic is slowing us down
Developers face challenges balancing internal stakeholder demands and external user needs, often spending time on code maintenance. Recognizing this work is crucial for enabling focus on new feature development.
Developers have been adding features to codebases for _decades_. It’s a demonstrably fine activity. The article doesn’t chain together “practice X causes bad effect Y”, it just says themed sentences one after the other that don’t follow a reasoned argument. There aren’t even any personal anecdotes.
There’s so many people writing much better instructive content; it’s a little heartbreaking seeing nonsense like this elevated.
the average developer spends 42% of their
work week on maintenance
Indeed, I see that happening all around me when I watch how my friends build their startups. The first few months they are productive, and then they sink deeper and deeper into the quicksand of catching up with changes in their stack.So far, I have done a somewhat good job of avoiding that. And I have a keen eye on avoiding it for the future.
I think a good stack to use is:
OS: Debian
DB: SQLite
Webserver: Apache
Backend Language: Python
Backend Library: Django
Frontend: HTML + CSS + Javascript
Frontend library: Handlebars
And no other dependencies.I called Django a "library" instead of a framework, because I do not create projects via "django-admin startproject myproject" but rather just do "import django" in the files that make use of Django functionality.
In the frontend, the only thing I use a library for is templating content that is dynamically updated on the client. Handlebars does it in a sane way.
This way, I expect that my stack is stable enough for me to keep all projects functional for decades to come. With less than a week of maintenance per year.
Classic systems was a single application, fully integrated. This design create a slow, incremental evolution and a plethora of small improvements, the commercial design of compartmentalized levels have created a Babel tower of dysfunctional crap mostly trying to punch holes between levels.
An example: a single NixOS home server can do on a very small system what a modern deploy with docker and co can do on an equivalent starship, a simple Emacs buffer, let's say a mail compose one, can allow quickly solving an ODE via Maxima, while a modern Office Suite can't without manual cut and paste and gazillion SLoC more, a Plan 9 mail system do not need to implement complex storage handling and network protocols, all it need is just in the system, mounting someone else remote mailbox, save a file there it's sending a message, reading a file from a mounted filesystem is reading one and the same is for viewing a website. In Gnus a mail, an RSS article and NNTP post are the same, because they are DAMN THE SAME, a damn text with optional extras consisting of a title and a body. That's the power of simplicity we lost to push walled gardens and keep users locked-in and powerless.
The modern commercial IT is simply untenable:
- even Alphabet can't scan the whole web, a distributed YaCy on gazzilion of homeservers can though, and with MUCH LESS iron and costs for the whole humanity;
- nobody can map as in an OSM model, since anybody doing it share everything on every other;
This is the power of FLOSS. It's time to admit that we can't afford a commerce/finance managed nervous system of our societies, simply.
> These outages didn’t happen because developers didn’t test software.
The conclusion being:
> How do you get quality code?...Don’t skimp on static code analysis and functional tests, which should be run as new code is written.
But even working from the conclusion backwards, which is "specs+code analysis" will save you from the big scary thing of "software erosion" and "complexity" thusly sparing us all from outages, I disagree.
Specs+analysis are helpful, but they do not magically solve complexity at scale. Crowdstrike sure, would've benefited from testing I agree but so many other large outtages need more than that, which is the disconnect of the article for me.
At some point you need blackbox, chaos monkey level production tests. Bring down your central database, bring down us-east-1. What happens to the business?
I'm not sure if this is valid, but a lot of the savvier tech companies' outtages feel like they're router configurations that lead to cascading traffic issues. But I have no data to back this thought up.
Funny how there is no mention of how modern tech companies offshored/outsourced and even fired manual QA testers. Developers aren’t testers. Do we expect a civil engineer to test the bridge they created before opening it to the public?
Also, with a move fast and break things mentality, stable and quality software went out the window for a continuous release of broken/buggy software.
All these initiatives and plan always does when it reaches an executive reacting with « it works now why should we spend any money on not making new features? We’re not doing your gold plating we don’t need it »
I eventually got tired of this, ran out of motivation, and quit software engineering.
MBAs who understand nothing about software treat software developers as code monkeys and then we are in this situation.
I’m still bitter about the whole thing and how it completely put me off writing software (which I used to love doing). Some days, I’m cheering for these failures and crashes imagining some exec somewhere will eat a big shit sandwich for causing it. But I’m not kidding myself, I know it’s the software engineers getting blamed and working over time for these outages…
- And tonight at 11...
- Dooooooom!
Don't worry: AI will do coding maintenance / bug fixes soon.
When I started in the 90's - maintaining different unix distributions was a continuous package game.
Now looking at ops and devs at companies, this has been the ongoing work.
I think it's just an integral part of computing and one of its core challenges...
Nobody can test everything. Big deployments require big testing.
No wonder that the whole system collapses.
To be fair, in the 1990s software wasn't great either, but many things were new and written under enormous time pressure like the Netscape browser.
Linux distributions were best around 2010. Google and Windows were best around that time, too.
Related
The software world is destroying itself (2018)
The software development industry faces sustainability challenges like application size growth and performance issues. Emphasizing efficient coding, it urges reevaluation of practices for quality improvement and environmental impact reduction.
The Software Crisis
The software crisis, coined in 1968, highlights challenges in managing software complexity. Despite advancements, issues persist, emphasizing responsible construction, user agency, and sustainable development practices through constrained abstractions and user empowerment.
Big Ball of Mud (1999)
The paper delves into the Big Ball of Mud software architecture, analyzing its causes, challenges, and strategies for improvement. It highlights the balance between pragmatism and long-term architectural considerations in software development.
Projects considered harmful – Part 1
Software development projects often prioritize time and budget over quality, leading to compromised dependability. Project managers focus on meeting objectives, neglecting software quality. Reevaluating project management practices is crucial for software dependability.
Ask HN: Business logic is slowing us down
Developers face challenges balancing internal stakeholder demands and external user needs, often spending time on code maintenance. Recognizing this work is crucial for enabling focus on new feature development.