There's always an events table (2022)
The article highlights the challenges of maintaining large event tables in SaaS products, emphasizing the need for effective cleanup strategies and proactive data management to enhance performance and prevent issues.
Read original articleThe article discusses the common practice of maintaining an events table, or audit log, in SaaS products, highlighting its implications and challenges. The author reflects on experiences with large event tables at companies like Retool and Heroku, noting that these tables often become the largest in the database, leading to operational inefficiencies. For instance, Heroku's event log grew to 1 TB, causing issues with database performance and backup times. The author emphasizes the importance of implementing a robust cleanup strategy to manage event data effectively. They share their own experience of creating an event log that includes a user-visible audit feature while ensuring efficient data management through practices like using appropriate data types, limiting data retention to 90 days, and deleting events in batches. The article concludes by stressing the need for a proactive approach to data management to avoid costly mistakes in the future.
- Events tables are common in SaaS products but can lead to operational challenges if not managed properly.
- Large event tables can significantly impact database performance and backup times.
- Implementing a cleanup strategy is crucial for efficient data management.
- Deleting events in batches and using appropriate data types can enhance performance.
- Proactive data management practices can prevent costly mistakes in the long run.
Related
Structured logs are the way to start
Structured logs are crucial for system insight, aiding search and aggregation. Despite storage challenges, prioritizing indexing and retention strategies is key. Valuable lessons can be gleaned from email for software processes.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
Are you considering Event Sourcing? Think again
Eduardo Bellani's blog post argues that traditional SQL RDBMS with temporal tables can provide similar benefits to event sourcing, encouraging developers to evaluate existing tools for cost-effective data management.
Does anyone integrate with their customers' DB directly?
The author discusses integrating third-party SaaS vendors with customer databases, reflecting on their fintech experience, emphasizing information security, and exploring a tool to standardize this integration while seeking feedback.
Nah, I am quite sure
for item in conn.query('SELECT * FROM event'):
if item.created_at < created_at_horizon:
conn.exec('DELETE FROM event WHERE ' + ' AND '.join(f'{k} = :{k}' for k in item.__dict__.keys), item.__dict__)
is fine, so I won't even bother measuring it. It's also secure, don't worry: as you can see I used prepared statements in it.(Facts are immutable, values change with time: https://www.youtube.com/watch?v=D6nYfttnVco )
I would much rather a solution where I can offload old events into some other storage system, so that it doesn't clog my database but the information is still retrievable.
Related
Structured logs are the way to start
Structured logs are crucial for system insight, aiding search and aggregation. Despite storage challenges, prioritizing indexing and retention strategies is key. Valuable lessons can be gleaned from email for software processes.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
Are you considering Event Sourcing? Think again
Eduardo Bellani's blog post argues that traditional SQL RDBMS with temporal tables can provide similar benefits to event sourcing, encouraging developers to evaluate existing tools for cost-effective data management.
Does anyone integrate with their customers' DB directly?
The author discusses integrating third-party SaaS vendors with customer databases, reflecting on their fintech experience, emphasizing information security, and exploring a tool to standardize this integration while seeking feedback.