Understanding OpenTelemetry

When building an observability platform, you face a fundamental choice: what instrumentation standard should you use?

You could use vendor-specific SDKs (Datadog's libraries, New Relic's agents), but that creates lock-in. You could use multiple specialized tools (Jaeger for traces, Prometheus client for metrics, Fluentd for logs), but that means maintaining multiple instrumentation systems.

OpenTelemetry solves both problems.

What OpenTelemetry Actually Is

OpenTelemetry (OTel) is three things:

A specification that defines how telemetry data should be structured
SDKs for every major language that implement this specification
The OpenTelemetry Collector, a vendor-agnostic data pipeline

The key insight is separation of concerns. Your application code instruments itself using the OpenTelemetry SDK, speaking a standard protocol (OTLP). Where that data goes—Jaeger, Datadog, Honeycomb, your own backends—is a deployment-time decision, not a code-time decision.

┌───────────────────────────────────────────────────────────────┐
│                    YOUR APPLICATION CODE                       │
│    ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐     │
│    │  Go SDK  │  │ Java SDK │  │ .NET SDK │  │  Python  │     │
│    └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘     │
│         │             │             │             │            │
│         └─────────────┴──────┬──────┴─────────────┘            │
│                              │                                 │
│                        OTLP Protocol                           │
│                       (Open Standard)                          │
└──────────────────────────────┬─────────────────────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────────┐
│                   OPENTELEMETRY COLLECTOR                      │
│              Receive → Process → Export (your choice)          │
└──────────────────────────────┬─────────────────────────────────┘
                               │
             ┌─────────────────┼─────────────────┐
             │                 │                 │
             ▼                 ▼                 ▼
        ┌─────────┐       ┌─────────┐       ┌─────────┐
        │ Jaeger  │       │  Your   │       │ Datadog │
        │ (Self-  │       │ Backend │       │ (If you │
        │ hosted) │       │  Here   │       │  want)  │
        └─────────┘       └─────────┘       └─────────┘

Why OpenTelemetry Matters

No Vendor Lock-In

If you start with self-hosted backends and later decide you want a managed service, you change your Collector configuration—not your application code. Your investment in instrumentation is protected.

One SDK to Learn

Instead of teaching your team Jaeger's SDK for traces, Prometheus client for metrics, and some logging framework for logs, everyone learns OpenTelemetry. One set of concepts, one set of APIs.

Industry Momentum

OpenTelemetry is a CNCF project with contributions from Google, Microsoft, Amazon, Splunk, Datadog, and most other major players. It's rapidly becoming the standard way to instrument applications.

Rich Ecosystem

Auto-instrumentation libraries exist for most common frameworks. In many cases, you can add observability to an existing application with minimal code changes.

Core Concepts

Signals

OpenTelemetry supports three telemetry signals:

Signal	Description	Status
Traces	Distributed traces with spans	Stable
Metrics	Counters, gauges, histograms	Stable
Logs	Structured log records	Stable

The OTLP Protocol

OTLP (OpenTelemetry Protocol) is the native protocol for transmitting telemetry. It supports both gRPC and HTTP transport:

Transport	Port	Use Case
gRPC	4317	High-throughput, production workloads
HTTP	4318	Environments where gRPC is problematic

Context Propagation

Context propagation is how trace context flows between services. When Service A calls Service B, the trace ID must be passed along so spans can be correlated.

OpenTelemetry supports multiple propagation formats:

Format	Description
W3C Trace Context	Standard format, recommended
B3	Zipkin format, legacy compatibility
Jaeger	Jaeger native format

The OpenTelemetry Collector

The Collector is the component that gives you flexibility. It's a standalone service that receives telemetry from anywhere, transforms it however you need, and sends it wherever you want.

Collector Architecture

┌─────────────────────────────────────────────────────────────┐
│                    OTEL COLLECTOR                           │
│                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │  Receivers  │ → │  Processors  │ → │  Exporters  │     │
│  │             │    │             │    │             │     │
│  │ • otlp      │    │ • batch     │    │ • otlp      │     │
│  │ • jaeger    │    │ • memory    │    │ • jaeger    │     │
│  │ • prometheus│    │ • filter    │    │ • prometheus│     │
│  │ • zipkin    │    │ • sampling  │    │ • loki      │     │
│  │ • syslog    │    │ • transform │    │ • datadog   │     │
│  └─────────────┘    └─────────────┘    └─────────────┘     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Receivers: Accepting Data

Receivers accept telemetry data from various sources:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-service'
          static_configs:
            - targets: ['localhost:8080']

Processors: Transforming Data

Processors modify or filter telemetry before export:

processors:
  # Batch data for efficient export
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  # Prevent OOM crashes
  memory_limiter:
    check_interval: 1s
    limit_mib: 1500
    spike_limit_mib: 500
  
  # Add metadata
  resource:
    attributes:
      - key: environment
        value: production
        action: insert
  
  # Filter out noisy data
  filter:
    traces:
      span:
        - 'attributes["http.target"] == "/health"'

Exporters: Sending Data

Exporters send processed telemetry to backends:

exporters:
  # Traces to Jaeger
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  # Metrics to Prometheus
  prometheus:
    endpoint: "0.0.0.0:8889"
  
  # Logs to Loki
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

Pipelines: Connecting Components

Pipelines wire receivers, processors, and exporters together:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/jaeger]
    
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

Auto-Instrumentation

One of OpenTelemetry's most powerful features is auto-instrumentation—adding observability to applications with minimal code changes.

How It Works

Auto-instrumentation libraries wrap common frameworks and libraries, automatically creating spans for:

HTTP requests (incoming and outgoing)
Database queries
Message queue operations
gRPC calls
And many more

Language Examples

Go: Explicit but clean

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
)

func initTracer() func() {
    exporter, _ := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("localhost:4317"),
        otlptracegrpc.WithInsecure(),
    )
    // ... setup provider
}

Python: Zero-code option available

# Install
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# Run with auto-instrumentation
opentelemetry-instrument \
    --service_name my-service \
    --exporter_otlp_endpoint http://localhost:4317 \
    python app.py

Node.js: Minimal code required

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4317' }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();

.NET: Built into the runtime

builder.Services.AddOpenTelemetry()
    .WithTracing(t => t
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://localhost:4317")))
    .WithMetrics(m => m
        .AddAspNetCoreInstrumentation()
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://localhost:4317")));

Deployment Patterns

Pattern 1: Sidecar

Each application has its own collector:

┌─────────────────────┐    ┌─────────────────────┐
│       Pod A         │    │       Pod B         │
│  ┌──────┐ ┌──────┐  │    │  ┌──────┐ ┌──────┐  │
│  │ App  │→│ OTel │  │    │  │ App  │→│ OTel │  │
│  └──────┘ └──────┘  │    │  └──────┘ └──────┘  │
└──────────┬──────────┘    └──────────┬──────────┘
           │                          │
           └──────────┬───────────────┘
                      ▼
               ┌─────────────┐
               │   Backend   │
               └─────────────┘

Pros: Isolation, per-app configuration
Cons: Resource overhead, many collectors to manage

Pattern 2: Gateway

Centralized collector(s) receive from all applications:

┌───────┐  ┌───────┐  ┌───────┐
│ App A │  │ App B │  │ App C │
└───┬───┘  └───┬───┘  └───┬───┘
    │          │          │
    └──────────┼──────────┘
               │
               ▼
       ┌───────────────┐
       │ OTel Gateway  │ ← Load balanced
       │   Collector   │
       └───────┬───────┘
               │
               ▼
          ┌─────────┐
          │ Backend │
          └─────────┘

Pros: Centralized management, efficient resource use
Cons: Single point of failure (unless HA)

Pattern 3: Agent + Gateway (Recommended for Production)

Best of both worlds:

┌─────────────────────┐    ┌─────────────────────┐
│   Node/Host A       │    │   Node/Host B       │
│  ┌───┐ ┌───┐ ┌───┐  │    │  ┌───┐ ┌───┐ ┌───┐  │
│  │App│ │App│ │App│  │    │  │App│ │App│ │App│  │
│  └─┬─┘ └─┬─┘ └─┬─┘  │    │  └─┬─┘ └─┬─┘ └─┬─┘  │
│    └──┬──┘    │     │    │    └──┬──┘    │     │
│       ▼       │     │    │       ▼       │     │
│   ┌───────┐   │     │    │   ┌───────┐   │     │
│   │ Agent │←──┘     │    │   │ Agent │←──┘     │
│   └───┬───┘         │    │   └───┬───┘         │
└───────┼─────────────┘    └───────┼─────────────┘
        │                          │
        └──────────┬───────────────┘
                   ▼
           ┌───────────────┐
           │ OTel Gateway  │
           │   Cluster     │
           └───────┬───────┘
                   ▼
              ┌─────────┐
              │ Backend │
              └─────────┘

Pros: Local buffering, centralized processing, scalable
Cons: More complex setup

Key Takeaways

OpenTelemetry is the standard — invest in learning it
Separate instrumentation from destination — your code shouldn't know where telemetry goes
The Collector is your Swiss Army knife — use it for receiving, processing, and routing
Auto-instrumentation gets you 80% there — add custom spans for business logic
Start simple — sidecar pattern first, evolve to gateway as you scale

Next: Observability Glossary →

What OpenTelemetry Actually Is​

Why OpenTelemetry Matters​

No Vendor Lock-In​

One SDK to Learn​

Industry Momentum​

Rich Ecosystem​

Core Concepts​

Signals​

The OTLP Protocol​

Context Propagation​

The OpenTelemetry Collector​

Collector Architecture​

Receivers: Accepting Data​

Processors: Transforming Data​

Exporters: Sending Data​

Pipelines: Connecting Components​

Auto-Instrumentation​

How It Works​

Language Examples​

Deployment Patterns​

Pattern 1: Sidecar​

Pattern 2: Gateway​

Pattern 3: Agent + Gateway (Recommended for Production)​

Key Takeaways​