Get me out of data hell
A senior engineer reflects on the chaotic experience of working with a complex data warehouse, criticizing the toxic culture and inefficiencies, while valuing the lessons learned and camaraderie among teammates.
Read original articleIn a reflective piece, a senior engineer describes the chaotic and frustrating experience of working with an enterprise data warehouse platform, humorously dubbed the "Pain Zone." The engineer highlights the convoluted architecture, which involves over a hundred operations for a task that should be straightforward, leading to inefficiencies and confusion. The culture within the organization is critiqued for fostering a sense of dread and disempowerment among engineers, who feel pressured to work quickly despite the poor quality of the codebase. The engineer recounts a specific day of navigating the Pain Zone, where they discover that logs are filled with nonsensical data, making it impossible to determine if data is being successfully processed. Despite the challenges, the team maintains a camaraderie that helps them cope with the absurdities of their work environment. The engineer plans to leave the company soon, viewing the experience as a painful but valuable lesson in resilience and craftsmanship. The narrative serves as a commentary on the pitfalls of poor data management practices and the impact of workplace culture on software engineering.
- The enterprise data warehouse platform is overly complex, with unnecessary operations complicating simple tasks.
- A toxic culture of fear and judgment hampers engineers' ability to work effectively and prioritize quality.
- The team relies on camaraderie to navigate the challenges of their chaotic work environment.
- The engineer plans to leave the company, viewing the experience as a lesson in resilience.
- Poor data management practices lead to significant inefficiencies and confusion in the workflow.
Related
"We ran out of columns" – The best, worst codebase
The author reflects on a chaotic codebase, highlighting challenges with a legacy database and a mix of programming languages. Despite flaws, it fostered creativity and problem-solving, leading to notable improvements.
Microsoft is a black hole of money and talent
A web developer criticizes Microsoft Dynamics ERP for its slow performance, inadequate programming language, unreliable tooling, and inefficient update process, highlighting its negative impact on customer experience and contract negotiations.
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
How Software Companies Die
The article examines the tension between creative programmers and management in software companies, emphasizing that management control can harm creativity, product quality, and lead to talented programmers leaving.
How to Ruin an Engineering Organization
The article highlights ten detrimental practices in engineering organizations, including gatekeeping, rapid staff turnover, lack of transparency, and neglecting coaching, which undermine trust, morale, and innovation.
Like why didn't anyone catch the issue with the logs? Because it doesn't matter, every data team is a cost-centre that unscrupulous managers use to launch their careers by saying they're big on AI. So nothing works, no-one cares it doesn't work, most the data engineers are incapable of coding fizzbuzz but it doesn't matter.
People always wonder why banks etc. use old mainframes. There's like a 0% success rate for new data projects. And that 0% includes projects which had launch parties etc. but no-one ever used the data or noticed how broken it was. I don't think a lot of orgs which use data as core-infra could modernize, the industry is just so broken at this point I don't think we can do what we did 30 years ago.
But I am SO triggered by this piece. I had that intrusive feeling you sometimes get when driving where you think, "I could just close my eyes and see what happens", "Or that clif is so close and the guardrail doesn't really extend far enough"
Only for my career. Like I should just not show up on Monday. I should get in the car and drive far away and change my name and work at a nice retail joint in a mid-sized town.
I'm going to need to sit and stare into the distance for an hour and 3.
Oof, that hits a little close to home.
Exactly why I burned-out at work, worked at most 2 hours per day on a good day and finally was ejected from the project after a PM that graduated last year from school noticed and went after my head. Author is a wizard for describing the situation this well.
It's been 3 days I've been free from the tyranny of Jira and project managers, and I worked more on my personal projets than I did in a week at my former workplace.
A beautiful epigram.
The author’s experience is not far off from my own.
1. Any solution in place can only be understood by the person who created it
2. ”No, we can’t change that because then we’d have to validate everything from scratch again”
And therefore, as the author says:
> ”we'll continue with the work instead of fixing the critical production error”
I’m honestly not sure how to address it either. With traditional software dev we’d write tests, incorporate those into CI/CD, and start to course correct. We can use sample data to validate the code does what we think it does and that we didn’t break it.
But in these data projects, it’s not only the code that’s changing, but the data is also a moving target. You can write a test with sample data, but tomorrow your data might change because someone in sales added a custom field to the CRM, or IT upgraded the accounting software and all of the unique IDs changed, or someone upgraded their Excel version, or whatever.
And your code that works on the sample data needs to handle all of this, which obviously it can’t. You can try to validate the data somehow, check the schema, check if the number of rows hasn’t doubled or halved, and so forth, and then stop it from importing until you look into it, but also you can’t stop inbound data because an exec has a meeting in a few hours and expects their report to be updated.
I heard something about “data contracts” that’s supposed to address this, but it sounds like the next in a long line of buzz words intended to get management to buy another data product.
Has anyone worked in this kind of project that went well?
I see questions like these a lot and every time I feel that people immensely underestimate the effort required for curating data. In my experience data can only ever be as good as what it's being used for and in this story the logs haven't been used for this purpose before so they're not going to be any good.
It's some sort of data variation on the second law of thermodynamics - entropy is winning. Going in with the expectation that things should be better will only lead to frustration.
We shouldn't just have wide events/big spans emitted... We should have those spans drive the pipeline. Rather than observability being a passive monitoring system, if we write code that reacts to events we are capturing, then we shuffle towards event sourcing.
Given how badly coupled together with shoestring glue & good wishes so many systems are, how opaque these pain zones are, it feels like the centralization upon existing industry standard protocols to capture events (which imo include traces) is a clear win.
(Obvious downside, these systems become mission critical, business process & monitoring both.)
I resigned due to the night terrors caused by the cyber security issues I saw everywhere. The more I explored and understood the more sleep I lost.
Thank you for this phrase; I'll quote it at every opportunity.
For real, a fun and refreshing read (if also a little haunting).
I feel this comment in my bones.
Add: for people who sharer the feeling -- you can work in a place where velocity isn't all, managers are not assholes and you can dedicate yourself to craft.
Can someone, please, tell me this is a joke. Because I can't be certain, but it doesn't look like one.
That's why you build data platforms and name your team accordingly. This is much easier position to defend, where you and your team have a mandate to build tools for other to be efficient with data.
If upstream provides funky logs or jsons where you expect strings, that's for your downstream to worry about. They need the data and they need to chase down the right people in the org to resolve that. Your responsibility should be only to provide a unified access to that external data and ideally some governance around the access like logging and lineage.
Tldr; Open your 'data' mandate too wide and vague and you won't survive as a team. Build data platforms instead.
If you want an answer to a specific question, we can spin up a read replica and a Metabase and write a query in an afternoon, cool. I’ll get you a chart, we’ll move on. If you want “a data analytics platform to enable blah blah blah” I’m out, I can’t do it. My eyes won’t focus, my hands stop moving.
Developers sometimes tell me stuff like “Kubernetes is too complex”, “jeez React is a pain”. I send those quotes to my friends stuck writing 195 step DAGs to transform log files from s3 into s3 so they can eventually land in s3 - ah yes but they’re parquet somewhere in between, and that matters for some reason. We laugh together, but I can see it hurts them more than I intended.
Life is too short to faff about doing nothing. Go join a company with less than 100 engineers and learn to be happy again. Let the enterprises burn, we’ll all be better for it.
Anyways this was a fantastic piece, I hope this person writes their book after all.
AU dev scene is not great. Really heavy with POs, and PMs and CTO's without the background.
Ow
Brilliant
He is writing as if the engineers all knew how to fix the systems, but were just powerless to do that. But I’ve also seen projects lead by engineers that only added to the overall complexity.
There is a paradox in this - the people who seem the most confident about fixing the systems usually only make things worse. Chesterton fences and stuff.
This article triggers me because everybody who reads it will always believe that they would fix the mess if only they got the power, but in practice when they get power they would only add new complexity to the whole mess.
Because there were no automated tests. If the company needs something to work, that thing needs a, preferably automated, test.
Very happy to see this here, realise it's the same person and that this is "a thing", and then to rollick in the author's backlog. A joy! Raucous real-life laughter has exploded from me on numerous occasions along with most articles. I think I've read 5 in a row there, and my brain is buzzing happily.
Thank you to the author for having the courage to write about real experiences. A breath of fresh air. I look forward to future books and articles, and reading more previous work, and cross my proverbial fingers hoping they can keep it real in the face of what will presumably be an avalanche of grifters looking to leech off the attention.
I have a fantasy of quitting my job to write books on the [modern] theory and practice of information systems engineering. Not 'how to write software', that's been done; I mean all the forms of engineering around software/information systems. In my dream, I write the books, everyone reads them, and starts doing their jobs right.
But then I remember, I, a person arrogant enough to believe he knows how to do things right, still can't get shit done right. Maybe if I were a one-man company, I could 'do everything right', and feel good about the result. But I depend on an entire company of people to do the right thing, in the right way, at the right time. That's hard even with the best people. No company is made up of the best people. It's always a mix of the best, worst, and in-between.
Strangely, a company can put out a decent product, despite the company being a tire fire. This is some comfort when you get older. You realize that everything being shit is okay, as long as the bills are paid. I have PTSD from when the thing that paid the bills was on fire, every week, for years. Lately at every job I have, I internally panic and scream at how horrible everything is. Because I'm haunted by what might happen. But it's not happening yet. So I muffle the screams, smile and nod along with the stand-up-meeting-cum-status-update.
The sad thing is, I forget that it's okay that the stand-up is shit. I forget that I'm still getting a fat paycheck just to sit in meetings that could have been an e-mail. I forget that, despite the company bleeding cloud costs [no savings plans, RIs, serverless, right-sizing, etc], we seem to be making a profit. Despite the terrible designs, bad process, ineffective leadership, absentee management, lack of security, and all the rest, the bottom line is fine. The shit is fine. Currently, and probably for the unforeseeable future.
I get craftsmanship. I'm a crappy woodworker. I enjoy making things well, and getting better at it. But our jobs are not fine woodworking. Our jobs are construction. We are banging rusted nails into shitty, twisted, racked, cupped, knotty-ass studs. If we're lucky. Yeah, this building is going to be shit. But somebody's still going to pay for it. And there'll be another job after. If we really wanted fine woodworking, we never would have taken this job, and we know it. We'd be struggling to sell a cabinet that took us two 80+ hour weeks, too tired to appreciate its beauty, too defeated by flaws only we notice.
So let's stop beating ourselves up. Let's stop beating each other up. We don't, can't, won't, find meaning in this monument to mediocrity. No comfort from the pain zone. No pride to take home. But we are paying the bills, with more left over than most have. No broken backs and long hours. No lack of health care, no abuse from customers or the public. Not even that big a worry about job security. We are the lucky ones. We are blessed with a golden shovel. So let's do like those blue collar laborers we often idolize, and get to this annoying, bloody awful work that we are blessed with.
> we're serverless, because how can you hurt yourself without a cutting-edge?
Just perfect.
^^^ Then do it... and then strangler fig the original.
The older the company is, the more likely one finds this morass.
It won't change absent powerful technical leadership.
Right there ^^^
> coated in grass which rends those who tread upon it like a legion of upraised spears,
and there's more. Get to the point, whatever it is.
I've tried to come up with some heuristic to determine whether or not a team is competent, good, or doomed. I've been exposed to all over the last... 8-10 years, and one of the key things I've noticed is the ratio of competent/skilled developers to the unskilled ones is a big ... indicator(?). Predictor?
Colleague of mine has been working with a team - dev team has ranged from 5-8 people over the last few years. Few people seem to have any grasp of programming at all. Only two people - my colleague and one other - have ever taken projects from ideas to delivery, or even taken features from requests to successful rollout of already functioning software.
The arguments that people get in to there - days or weeks of people 'researching' whether or not OAUTH 'really' requires 'refresh tokens' or whether it's really supposed to be a JWT. Management has some notion of 'every voice is legitimate and should be heard - we don't support bullying' and so on.
If you have a team of 10, and 1 or 2 people are simply bad at having the ability to think somewhat abstractly, you can survive.
If that number hits, say, 4-5... the team will struggle. A lot. You can keep things going, but it will be slow. And everything becomes a battle.
If that number becomes 7 or 8, and you only really have 1-2 developers who are actually competent developers... things will continue to spiral downward.
On the other side - I worked with a team of about 8-10 people on a 6 month contract. The larger org had another 40 or so folks, handling other projects, and support. Onboarding was great - I pushed production code in the first week. Everyone on the team was competent, including the juniors. I had more development experience, but they had more company experience, and it was really a relatively enjoyable engagement overall.
It was refreshing to be able to ask anyone on the team questions, and either get a workable answer, or an "I'm not sure, let's check with XYZ" to get working answers. The "oh, yeah, it's ABC" when ABC is clearly not the answer stuff never happened. People committing code and pushing to production without ever having run the code at all - I've experienced that - didn't happen - that's happening to my colleague.
The problem with a plurality of tech-incompetent folks in a tech group is that they honestly can not determine that they aren't competent. The only examples of competence are in the minority, and tend to not be trusted (even though that minority is the only portion that turns out working/functional code).
Leaving ends up being the only option in those cases. My colleague is only at his place part time, and has hung around because they've gone through some restructuring where new folks were brought in, and... you hope that things might get better in a few months, then realize they don't.
Related
"We ran out of columns" – The best, worst codebase
The author reflects on a chaotic codebase, highlighting challenges with a legacy database and a mix of programming languages. Despite flaws, it fostered creativity and problem-solving, leading to notable improvements.
Microsoft is a black hole of money and talent
A web developer criticizes Microsoft Dynamics ERP for its slow performance, inadequate programming language, unreliable tooling, and inefficient update process, highlighting its negative impact on customer experience and contract negotiations.
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
How Software Companies Die
The article examines the tension between creative programmers and management in software companies, emphasizing that management control can harm creativity, product quality, and lead to talented programmers leaving.
How to Ruin an Engineering Organization
The article highlights ten detrimental practices in engineering organizations, including gatekeeping, rapid staff turnover, lack of transparency, and neglecting coaching, which undermine trust, morale, and innovation.