December 23rd, 2024

A Practitioner's Guide to Wide Events

Jeremy Morrell discusses Wide Event-style instrumentation in software engineering, highlighting its benefits for debugging, the importance of emitting single events, and practical guidance on tools and techniques for effective data analysis.

Read original article

Jeremy Morrell discusses the implementation of Wide Event-style instrumentation in software engineering, emphasizing its benefits in enhancing feedback loops and simplifying debugging processes. He highlights the importance of emitting a single event for each unit of work, which can be referred to as a log line or span. Morrell notes that while the concept of Wide Events is gaining traction, it is not new, with previous discussions by industry figures like Charity Majors and Brandur Leach. He provides practical guidance on how to adopt this approach, focusing on selecting appropriate tools for instrumentation and data visualization, such as Honeycomb and OpenTelemetry. Key techniques for effective data analysis include visualizing data, grouping by various dimensions, and filtering to narrow down specific segments. Morrell also suggests creating middleware to manage spans effectively and emphasizes the need for extensive attributes in wide events to improve debugging and system understanding. He concludes by stressing the importance of including service metadata, instance details, and build information in telemetry to facilitate incident response and system monitoring.

- Wide Event-style instrumentation improves debugging and system management.

- Emitting a single event per unit of work is crucial for effective observability.

- Selecting the right tools, like Honeycomb and OpenTelemetry, enhances data analysis capabilities.

- Key techniques include data visualization, grouping, and filtering for focused insights.

- Comprehensive attributes in wide events aid in incident response and system understanding.

Bad habits that stop engineering teams from high-performance

Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.

Is it time to version observability?

The article outlines the transition from Observability 1.0 to 2.0, highlighting structured logs for better data analysis, improved debugging, and enhanced software development, likening its impact to virtualization.

OpenTelemetry Tracing in < 200 lines of code

OpenTelemetry tracing simplifies to structured logging and context propagation, using spans for metadata. It supports distributed tracing via HTTP headers and requires instrumentation to capture trace data for monitoring.

There's always an events table (2022)

The article highlights the challenges of maintaining large event tables in SaaS products, emphasizing the need for effective cleanup strategies and proactive data management to enhance performance and prevent issues.

Logging Best Practices: An Engineer's Checklist

Effective logging practices are vital for system integrity, enhancing troubleshooting and performance monitoring. Key strategies include structured logs, centralized management, and tools like Honeycomb for improved analysis and observability.

6 comments

By @thom - 4 months

I’m quite looking forward to a future where we’ve finally accepted that all this stuff is just part of the domain and shouldn’t be treated like an ugly stepchild, and we’ve merged OLTP and OLAP with great performance for both, and the wolf also shall dwell with the lamb, and we’ll all get lots of work done.

By @coolguy4 - 4 months

Wide events are good, but watch out they don't become "god events". The event that every service needs to ingest, and, therefore, if there's new data that a service needs then we just add it onto the god event, because, conveniently, it's already being ingested. Before too long, the query that generates the wide event is getting so complex it's setting the db on fire. Like anything, there are trade offs; practical limits to how wide an event should reasonably become.

By @marmaduke - 4 months

i wonder if there are any semi automated approaches to finding outliers or “things worth investigating” in these traces, or is it just eyeballs all the way down?

By @valyala - 4 months

Wide events is a great concept for observability space! This a superset of structured logs and traces. Wide events is basically structured logs, where every log entry contains hundreds of fields with various properties of the log entry. This allows slicing and dicing the collected events by arbitrary subsets of thier fields. This opens an infinite possibilities to obtain useful analytics from the collected events.

Wide events can be stored in traditional databases. But this approach has a few drawbacks:

- Every wide event can have different sets of fields. Such fields cannot be mapped to the classical relational table columns, since the full set of potential fields, which can be seen in wide events, isn't known beforehand.

- The number of fields in wide events is usually quite big - from tens to a few hundreds. If we are going to store them in a traditional relational table, this table will end up with hundreds of columns. Such tables aren't processed efficiently by traditional databases.

- Typical queries over wide events usually refer only a few fields out of hundreds of available fields. Traditional databases usually store every row in a table as a contiguous chunk of data with all the values for all the fields of the row (aka row-based storage). Such a scheme is very inefficient when the query needs to process only a few fields out of hundreds of available fields, since the database needs to read all the hundreds fields per each row and then extract the needed few fields.

It is much better to use analytical databases such as ClickHouse for storing and processing of big volumes of wide events. Such databases usually store values per every field in contiguous data chunks (aka column-oriented storage). This allows reading and processing only the needed few fields mentioned in the query, while skipping the rest of hundreds fields. This also allows efficiently compressing field values, which reduces storage space usage and improves performance for queries limited by disk read speed.

Analytical databases don't resolve the first issue mentioned above, since they usually need creating a table with the pre-defined columns before storing wide events into it. This means that you cannot store wide events with arbitrary sets of fields, which can be unknown before creating the table.

I'm working on a specialized open-source database for wide events, which resolves all the issues mentioned above. It doesn't need creating any table schemas before starting ingesting wide events with arbitrary sets of fields (e.g. it is schemaless). It automatically creates the needed columns for all the fields it sees during data ingestion. It uses column-oriented storage, so it provides query performance comparable to analytical databases. The name of this database is VictoriaLogs. Strange name for the database specialized for efficient processing of wide events :) This is because initially it was designed for storing logs - both plaintext and structured. Later it has been appeared that it's architecture ideally fits wide events. Check it out - https://docs.victoriametrics.com/victorialogs/

By @patrulek - 4 months

Tldr; just use slog package (structured logs) to log everything and then visualize.

By @zahlman - 4 months

Practitioner of what? What is a "wide event"? In what context is this concept relevant? It took several sentences before I was even confident that this is something to do with programming.

A Practitioner's Guide to Wide Events

Related

Bad habits that stop engineering teams from high-performance

Is it time to version observability?

OpenTelemetry Tracing in < 200 lines of code

There's always an events table (2022)

Logging Best Practices: An Engineer's Checklist

Related

Bad habits that stop engineering teams from high-performance

Is it time to version observability?

OpenTelemetry Tracing in < 200 lines of code

There's always an events table (2022)

Logging Best Practices: An Engineer's Checklist