September 15th, 2024

OpenTelemetry and vendor neutrality: how to build an observability strategy

OpenTelemetry provides a vendor-neutral observability framework with three layers: source, collector, and backend, promoting flexibility and interoperability while preventing proprietary lock-in through open standards.

Read original articleLink Icon
FrustrationAppreciationSkepticism
OpenTelemetry and vendor neutrality: how to build an observability strategy

OpenTelemetry offers a vendor-neutral approach to observability, which is particularly beneficial for organizations that have faced challenges with vendor lock-in from proprietary monitoring solutions. This neutrality is a core principle at Grafana Labs, emphasizing the importance of flexibility in telemetry strategies. The OpenTelemetry framework consists of three key layers: the applications and infrastructure that generate telemetry data, the telemetry collector that processes and forwards this data, and the telemetry backend where data is stored and analyzed. The framework promotes loose coupling between these layers, allowing users to select different components without being tied to a single vendor. Open standards play a crucial role in this flexibility, as they prevent proprietary lock-in by ensuring that telemetry collectors can communicate using agreed-upon formats. The OpenTelemetry Collector serves as a primary example of this approach. However, once telemetry data is stored in a backend database, the choice of database schema and query language becomes critical, as it can influence the ease of data utilization. Overall, OpenTelemetry aims to provide a level playing field for observability solutions, fostering innovation and interoperability within the ecosystem.

- OpenTelemetry promotes vendor neutrality to avoid lock-in from proprietary solutions.

- The framework consists of three layers: source, collector, and backend.

- Loose coupling between components enhances flexibility in observability strategies.

- Open standards prevent proprietary lock-in and facilitate interoperability.

- The choice of database schema and query language is crucial for data utilization.

AI: What people are saying
The comments reflect a range of experiences and opinions regarding OpenTelemetry's observability framework.
  • Many users appreciate OpenTelemetry for its flexibility and ability to reduce vendor lock-in, allowing easy switching between backends.
  • Some users highlight the efficiency of the auto-instrumentation feature, although there are concerns about its reliability and debugging challenges.
  • There are mixed feelings about the performance of metrics and logs, with some users finding them less efficient compared to tracing.
  • Several commenters emphasize the importance of community support and active maintenance of OpenTelemetry libraries across various platforms.
  • Concerns about corporate resistance to adopting OpenTelemetry due to negative perceptions from other vendors were also noted.
Link Icon 14 comments
By @demurgos - 5 months
OpenTelemetry is nice overall as there are library for multiple platforms. I introduced it this year for a web game platform with servers in Node, Java, PHP and Rust and it all worked roughly similarly which made it good for consistency.

I like how OpenTelemetry decouples the signal sink from the context, compared to other structured logging libs where you wrap your sink in layers. The main thing that I dislike is the auto-instrumentation of third-party libraries: it works great most of the time, but when it doesn't it's hard to debug. Maintainers of the different OpenTelemetry repos are fairly active and respond quickly.

It's still relatively recent, but I would recommend OpenTelemetry if you're looking for an observability framework.

By @pranay01 - 5 months
I think the biggest value I see with OpenTelemetry is the ability to instrument your code and telemetry pipeline once (using otel collector) and then choose a backend and visualisation framework which meets your requirement.

For example at SigNoz [0], we support OpenTelemetry format natively from Day 1 and use ClickHouse as the datastore layer which makes it very performant for aggregation and analytics queries.

There are alternative approaches like what Loki and Tempo does with blob storage based framework.

If your data is instrumented with Otel, you can easily switch between open source projects like SigNoz or Loki/Tempo/Grafana which IMO is very powerful.

We have seen users switch within a matter of hours from another backend to SigNoz, weh they are instrumented with Otel. This makes testing and evaluating new products super efficient.

Otherwise just the effort required to switch instrumentation to another vendor would have been enough to not ever think about evaluating another product

(Full Disclosure : I am one of the maintainers at SigNoz)

[1] https://github.com/signoz/signoz

By @wise0wl - 5 months
From the very beginning of my tenure at my current "start-up" I wrote a Rust bespoke implementation using the base OpenTelemetry library with lots of opinionated defaults and company specifics. We integrated this early on in our microservice development, and it's been an absolute game changer. All of our services include the library and use a simple boilerplate macro to include metrics and tracing into our Actix and Tonic servers, Tonic client, etc. Logs are slurped off Kubernetes pods using promtail.

It was easy enough that I, as a single SRE (at the time) could write and implement across dozens of services in a few months of part-time work while handling all my other normal duties. OpenTelemetry has proved to be worth the investment, and we have stayed within the Grafana ecosystem, now paying for Grafana Cloud (to save our time on maintaining the stack in our Kubernetes clusters).

I would absolutely recommend it. I would recommend it and hopefully use it at any new future positions.

By @nicholasjarnold - 5 months
I can confirm that this is a pretty good way. Building out a basic distributed tracing solution with OTEL, jaeger and the relevant Spring Boot configuration and dependencies was quite a pleasant experience once you figure out the relevant-for-your-use-cases set of dependencies. It's one of those nice things that Just Works™, at least for Java 17 and 21 and Spring Boot 3.2 (iirc) or greater.

There appeared to be wide array of library and framework support across various stacks, but I can only attest personally to the quality of the above setup (Java, Boot, etc).

By @pards - 5 months
I tried to introduce OTel in a greenfield system at a large Canadian bank. Corporate IT pushed back hard because they'd heard a lot of "negative feedback" about it at a Dynatrace conference. No surprises there.

Corporate IT were not interested in reducing vendor lock-in; in fact, they asked us to ditch the OTel Collector in favour of Dynatrace OneAgent even though we could have integrated with Dynatrace without it.

By @aramattamara - 5 months
The problem with OpenTelemetry is that it really only good for tracing. Metrics and logs are kinda bungee strapped later: very inefficient and clunky to use.

PS: And devs (Lightspeed?) seem to really like "Open" prefix: OpenTracing + OpenCensus = OpenTelemetry.

By @jcgrillo - 5 months
Why do all these things use such damnably inefficient wire formats?

For metrics, we're shipping a bunch of numbers over the wire, with some string tags. So why not something like:

  message Measurements {
    uint32 metric_id = 1;
    uint64 t0_seconds = 2;
    uint32 t0_nanoseconds = 3;
    repeated uint64 delta_nanoseconds [packed = true] = 4;
    repeated int64 values [packed = true] = 5;
  }
Where delta_nanoseconds represents a series of deltas from timestamp t0 and values has the same length as delta_nanoseconds. Tags could be sent separately:

  message Tags {
    uint32 metric_id = 1;
    repeated string tags = 2;
  }
That way you only have to send the tags if they change and the values are encoded efficiently. I bet you could have really nice granular monitoring e.g. sub ms precision quite cheaply this way.

Obviously there are further optimizations we can make if e.g. we know the values will respond nicely to delta encoding.

By @kbouck - 5 months
For anyone that has built more complex collector pipelines, I'm curious to know the tech stack:

  - otel collector?
  - kafka (or other mq)?
  - cribl?
  - vector?
  - other?
By @phillipcarter - 5 months
As an aside, can all of y'all competing observability vendors cool it with the pitches in the comments? Literally every time someone posts about observability there's a half dozen or more heavy-handed pitch comments to wade through.
By @candiddevmike - 5 months
This is hypocritical content marketing from a company that doesn't want you to be vendor neutral. As seen by the laughable use of hyperlinks to their products but no links when mentioning Prometheus or elasticsearch.

OTEL is great, I just wish the CNCF had better alternatives to Grafana labs.

By @PeterZaitsev - 5 months
Looking for target for your OTEL data checkout Coroot too - https://coroot.com/ Additionally to OTEL visualization it can use eBPF to generate traces for applications where OpenTelemetry installation can't be done.
By @exabrial - 5 months
JMX -> Jolokia -> Telegraf -> the-older-TICK-stack-before-influxdb-rewrote-it-for-the-3rd-time-progressively-making-it-worse-each-time
By @prabhatsharma - 5 months
Opentelmetry is definitely a good thing that will help reduce vendor lock-in and exploitative practices from some vendors when they see that the customer is locked in due to the proprietary code instrumentation. In addition, opentelemetry autoinstrumentation is fantastic and allows one to get started with zero code instrumentation.

Going back to the basics - 12 factor app principles must also be adhered to in scenarios where opentelemtry might not be an option for observability. e.g. Logging is not very mature in Opentlemetry for all the languages as of now. Sending logs to stdout provides a good way to allow the infrastructure to capture logs in a vendor neutral way using standard log forwarders of your choice like fluentbit and otel-collector. Refer - https://12factor.net/logs

OTLP is a great leveler in terms of choice that allows people to switch backends seamlessly and will force vendors to be nice to customers and ensure that enough value is provided for the price.

For those who are using kubernetes you should check the opentelemtry operator, which allows you to autoinstrument your applications written in Java, NodeJS, Python, PHP and Go by adding a single annotation to your manifest file. Check an example here of sutoinstrumentation -

                                                 /-> review (python)
                                                /
frontend (go) -> shop (nodejs) -> product (java) \ \-> price (dotnet)

Check for complete code - https://github.com/openobserve/hotcommerce

p.s. An OpenObserve maintainer here.

By @mmanciop - 5 months
I can 100% confirm that OpenTelemetry is a fantastic project to get rid of most observability lock-in.

For context: I am the Head of Product of Dash0, a recently-launched Observability product 100% based on OpenTelemetry. (And Dash0 is not even the first observability based on OpenTelemetry I work on.)

OTLP as a wire protocol goes a long way in ensuring that your telemetry can be ingested by a variety of vendors, and software like the OpenTelemetry Collector enables you to forward the same data to multiple backends at the same time.

Semantic conventions, when implemented correctly by the instrumentations, put the burden of "making telemetry look right" on the side of the vendor, and that is a fantastic development for the practice of observability.

However, of course, there is more to vendor lock-in than "can it ingest the same data". The two other biggest sources of lock in are:

1) Query languages: Vendors that use proprietary query languages lock your alerting rules and dashboards (and institutional knowledge!) behind them. There is no "official" OpenTelemetry query language, but at Dash0 we found that PromQL suffices to do all types of alerting and dashboards. (Yes, even for logs and traces!)

2) Integrations with your company processes, e.g., reporting or on-call.