Good programmers worry about data structures and their relationships
Good programmers prioritize data structures over code, as they enhance maintainability and reliability. Starting with data design simplifies complexity, aligning with Unix philosophy and aiding senior engineers in system documentation.
Read original articleGood programmers prioritize data structures and their relationships over mere code, as emphasized by Linus Torvalds, the creator of Git and Linux. He argues that effective data structures lead to simpler, more maintainable code and enhance software reliability. By focusing on the data model during software design, developers can avoid complications later on. Torvalds illustrates this with an example where restructuring data simplified a complex function significantly, demonstrating that well-designed data structures can reduce code complexity and improve performance. He also references the Unix programming philosophy, which advocates for embedding knowledge into data to simplify program logic. The article suggests that programmers should start with data design, ensuring a clear understanding of data flow and component interactions before delving into code specifics. This approach is particularly relevant for senior engineers in tech companies, who are often required to create high-level design documents for complex systems. Overall, the emphasis is on the importance of data structures in software engineering, advocating for a shift in focus from code to data.
- Good programmers focus on data structures rather than just code.
- Well-designed data structures lead to easier maintenance and improved software reliability.
- Starting with data design can simplify code complexity and enhance performance.
- The Unix programming philosophy supports the idea of embedding knowledge into data.
- Senior engineers are expected to create high-level design documents for complex systems.
Related
Why We Build Simple Software
Simplicity in software development, likened to a Toyota Corolla's reliability, is crucial. Emphasizing straightforward tools and reducing complexity enhances reliability. Prioritizing simplicity over unnecessary features offers better value and reliability.
We Build Simple Software
Simplicity in software development, likened to a Toyota Corolla's reliability, is crucial. Emphasizing straightforward tools, Pickcode aims for user-friendly experiences. Beware of complex software's pitfalls; prioritize simplicity for better value and reliability.
My programming beliefs as of July 2024
Evan Hahn emphasizes tailored programming approaches, distinguishing "simple" from "easy," promoting testability through modularity, and advocating for ethical coding practices prioritizing societal impact and nuanced thinking in software development.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
How I Program in 2024
Kartik Agaram reflects on his programming journey, advocating for minimalist software design, emphasizing simplicity, context awareness, and the potential benefits of data-oriented design to improve software quality and adaptability.
https://softwareengineering.stackexchange.com/questions/1631...
Creating an elaborate type hierarchy with unnecessary abstractions is not what is meant by "worrying about data structures", and that tendency is one of the most common failure modes for otherwise smart engineers.
I don't know where our industry lost design rigor, but it happened; was it in the schools, the interviewing pipeline, lowering of the bar, or all of the above?
So my view of engineering has always been based on managing two things: functional state and data workflows
After doing software engineering professionally for a decade now I can tell you that:
1. Most “scientific” engineers back to Minsky, Shannon etc… describe the world of computing in terms of state management, data transformation and computing overhead management. All of the big figures and pioneers in software cared A LOT about data and state basically that’s all computing was at the beginning and was expected to be the pattern moving forward
2. There’s absolutely no consistency in what are the foundationally important assumptions in engineering system design that are always true such that everyone does them - and the ones that do are fads at best
3. Business timelines dictate engineering priorities and structures much more than robustness, antifragility, state management etc… in the vast majority of production software
4. Professional organizations like guilds, unions, etc… are almost universally rejected by software engineers. Nobody actually takes IEEE seriously because there’s no downside if you don’t. This ensures there’s no enforcement or self-regulation in engineering practices the same way there are in eg Civil and biomedical engineering. Even then those are barely utilized.
Overall the state of software development is totally divorced from its exceptionally high minded and philosophical roots, and is effectively led by corporations that are priorizing systems that make money for people with money.
So what is “good” has very little to do with what is incentivized
-- Fred Brooks
You have some data object whose structure provides constraints on how it can be transformed. And then the program logic is all about the structure-preserving transformations.
The transformations become simpler and easier to reason about, and you're basically left with a graph where the transformations are edges and the structures are nodes. And that's generally easier to reason about than an arbitrary imperative program.
Language also influences how important types are, regardless of function. Haskell is strict, LISP is less so. Python, being closer to LISP in syntax, but surfacing powerful C (closer to Haskell) primitives has proven valuing function over form can be empowering.
Premature modeling of a domain in verbose types (ex. struct vs any) can slow down rapid iteration in comprehending what is valuable from data or how users may actually use code. Someone might need not just one, but infinite cat pictures in their file upload, but the code _and the types_ treat this as a single value. Another example is using JSONB columns in their RDS initially and normalizing fields into columns when needed. A more flexible type system saves time in early iteration cycles.
This is also something which I learned far too late, my programming education focused very much on algorithmic thinking. That is important, but only helpful if you have already chosen the right data structures. Many times I have had the situation that the code I was writing was confusing and only a small part of it had to do with solving the actual problem. If this ever happens to you, you should rethink your data structures and consider whether they were chosen correctly.
Also, when reading code for the first time you should be looking at the data structures before anything else.
To be clear: Good programmers worry about the organization and cleanliness of their code. They worry that their code is reduced to the smallest of forms, consistent in expression, and exceptional in measure.
The limitation here is personality and not intelligence and there is a lot of data on this.
The personality metric of concern is conscientiousness, which is how a person perceives the world outside themselves. This one thing is responsible for self-discipline, concepts of organization, initiative, half of empathy, and much more. People at the extreme high end of this lean more towards things like authoritarianism, obligation, duty, healthy living, and social alignment. These people find joy in putting things into order and discerning relational structures.
People on the low end tend to be free spirits, are more likely to experiment with drug use, can't clean their rooms or pick up trash even if you put a gun to their heads. Concepts of work effort and self-reliance are almost entirely unimaginable. These people cannot organize anything and they require absurd rewards to accomplish the smallest tasks, and even still the output of their efforts is fleeting and temporary. They simply cannot see abstract relational concepts and cannot be compelled so.
Strangely, low scoring people struggle to discern value from a thing as they cannot perceive separations of vanity from functionality. Yet, they have no problem selling things in full awareness that if they cannot perceive value then neither can most other people. High scoring people don't do this and thus tend to make less effective merchandisers.
High scoring people tend to perceive low scoring people as slobs, sloths, and an anchor on social progress. Low scoring people tend to perceive high scoring people as perfectionists, prudes, and unnecessarily distracted on trivialities far outside their imagination.
The common assumption is that people who are brilliant at abstract organization and industriousness must be more intelligent. This makes sense because these people tend to be more successful in all aspects of life other than careers in entertainment. That assumption is completely wrong, though. Conscientiousness is negatively correlated to intelligence at -0.27, according to various studies.
I don't even attempt to do types at this point. It's really just about how the structure is going to look.
This is really a preference, then. I encountered almost this exact sort of problem in my last project. I wanted a simpler database design and more complex querying/code, they wanted a significantly more complex database design that was harder to understand (for everyone but the guy who spent all of one weekend designing it) but simpler querying/code (that was also more plentiful as a result). The question really is, where do you prefer your complexity to go? Do you want to lean on the database, or your code?
Simple example, you have a portfolio of stock that constantly changes in composition and value over time. Do you: 1) only store the current model of the portfolio in a "portfolios" table and the current prices of stocks in a "stock_prices" table and use a separate history table for both (with stored procedure triggers to automatically copy all changes to it) to store all previous versions that can then be queried separately if needed, OR 2) store each change in both quantity and price across multiple tables, no separation of what is "current" vs. what is "historical" other than the relationships that are (properly, hypothetically) set up via an "intent_versions" table at the top level, requiring a bunch of joins to actually determine the state of the portfolio both now and at any point in the past?
I opted for the former because I have no fear of complex queries, the center of thought-mass of the team leaned towards the latter. WWYD?
That is, it is easy to see many junior efforts stall out during schema design thinking that you can solve all issues with a fancy method of storing the data. It isn't the schema that is important about your data, so much, but where different updates to it are known first and what they will need to go with it.
While GIT might be particularly about data structures at it’s core, might I suggest you don't try to model into code your next complex payroll, insurance quotation, supply-chain or billing system as a composable set of lists, stacks, queues and trees, modified by code that grows over time to increasingly looking like a big ball of mud.
What your team knows matters more than either of these.
I worked on an e-commerce "platform" that used EAV and I always struggled to write queries to find anything I needed.
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...
I would say the "engineering" part of the design is also optional, as product design is also another lever of higher influence than code optimization.
Or are they only talking about tables in databases and such?
but also somehow we're supposed to write everything to read write flat text...
Thanks UNIX!
This is why I love TS over JS. At first it feels like more work up front, more hurdles to jump through. But over time it changed how I approached code: define the data (& their types) first, then write the logic. Type Driven Development!
Coming into TS from JS, it might feel like an unnecessary burden. But years into the codebase, it's so nice to have clear structures being passed around, instead of mystery objects mutated with random props through long processing chains.
Once the mindset changes, to seeing data definition as a new first step, the pains of getting-started friction are replaced by the joys of easy future additions and refactors.
Related
Why We Build Simple Software
Simplicity in software development, likened to a Toyota Corolla's reliability, is crucial. Emphasizing straightforward tools and reducing complexity enhances reliability. Prioritizing simplicity over unnecessary features offers better value and reliability.
We Build Simple Software
Simplicity in software development, likened to a Toyota Corolla's reliability, is crucial. Emphasizing straightforward tools, Pickcode aims for user-friendly experiences. Beware of complex software's pitfalls; prioritize simplicity for better value and reliability.
My programming beliefs as of July 2024
Evan Hahn emphasizes tailored programming approaches, distinguishing "simple" from "easy," promoting testability through modularity, and advocating for ethical coding practices prioritizing societal impact and nuanced thinking in software development.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
How I Program in 2024
Kartik Agaram reflects on his programming journey, advocating for minimalist software design, emphasizing simplicity, context awareness, and the potential benefits of data-oriented design to improve software quality and adaptability.