Tracing & OpenTelemetry
Spans, Traces and OpenTelemetry!
I recently started working at Lightstep and as part of my onboarding, I needed to learn more about OpenTelemetry, the open source project that combined the OpenTracing and OpenCensus projects back in 2019. In short, OpenTelemetry helps generate and collect telemetry data.
In this post I'll walk through three examples which helped me understand how tracing works. (Note: much of this is based on the excellent Getting Started docs from the OpenTelemetry website. I learn best by running things and writing them out for myself which is what I've done here...even if it duplicates some of what you might find on the OTel website)
The three examples start by emitting trace/span data locally, then gradually become more complex.
- LOCAL. The first example will generate a trace that gets printed onto your local console.
- SINGLE SERVICE. The second example is identical to the first except that the trace data is sent to an external endpoint (i.e. a dev account on Lightstep).
- AUTO-INSTRUMENTATION. The third example highlights how to auto-instrument a simple Flask application.
Local: Emit trace in console
In this example, we have a simple Python script which is importing some OTel libraries and then printing a Hello World
as well as a number of spans that compose the trace.
# tracing.py
from opentelemetry import trace # core tracing api
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import(
BatchSpanProcessor,
ConsoleSpanExporter,
)
provider = TracerProvider()
processor = BatchSpanProcessor(ConsoleSpanExporter()) # how we send spans
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__) #
with tracer.start_as_current_span("foo"):
with tracer.start_as_current_span("bar"):
with tracer.start_as_current_span("baz"):
with tracer.start_as_current_span("test"):
print("Hello world from OTel Python")
From the output we can see that we have 4 different spans which are named test
, baz
, bar
, and foo
. There are a couple things to note from this output.
- The spans show up in the console in reverse order so the first span,
foo
, shows up at the bottom - Each of the spans that were generated have the same
trace_id
but have a uniquespan_id
- Spans have a parent-child relationship and we can see this by mapping the
parent_id
in one span to thespan_id
of the parent span. Here’s a visual example of that that looks like in Miro. In short, we expect that the root span will befoo
which will be the parent of its child spanbar
which is the parent of its child spanbaz
.
at ~ % python3 tracing.py
Hello world from OTel Python
{
"name": "test",
"context": {
"trace_id": "0x3eff1c2c8beb53fd7773d6e414ac2d6a",
"span_id": "0x53a9be0bb1938f4b",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x0748eaa2ac417b0c",
"start_time": "2021-11-18T16:21:12.054505Z",
"end_time": "2021-11-18T16:21:12.054557Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.3.0",
"service.name": "unknown_service"
}
}
{
"name": "baz",
"context": {
"trace_id": "0x3eff1c2c8beb53fd7773d6e414ac2d6a",
"span_id": "0x0748eaa2ac417b0c",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x433498f43bb730f3",
"start_time": "2021-11-18T16:21:12.054471Z",
"end_time": "2021-11-18T16:21:12.054582Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.3.0",
"service.name": "unknown_service"
}
}
{
"name": "bar",
"context": {
"trace_id": "0x3eff1c2c8beb53fd7773d6e414ac2d6a",
"span_id": "0x433498f43bb730f3",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0xc8cc73ee2969c0d5",
"start_time": "2021-11-18T16:21:12.054432Z",
"end_time": "2021-11-18T16:21:12.054591Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.3.0",
"service.name": "unknown_service"
}
}
{
"name": "foo",
"context": {
"trace_id": "0x3eff1c2c8beb53fd7773d6e414ac2d6a",
"span_id": "0xc8cc73ee2969c0d5",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2021-11-18T16:21:12.054338Z",
"end_time": "2021-11-18T16:21:12.054599Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.3.0",
"service.name": "unknown_service"
}
}
Traces are composed of spans and each span has optional events that can be added. If you wanted to add events to the span data you would need to invoke the trace, get the current span, and add an event. Here’s what that code would look like:
with tracer.start_as_current_span("foo"):
trace.get_current_span().add_event("foo-event")
with tracer.start_as_current_span("bar"):
trace.get_current_span().add_event("bar-event")
with tracer.start_as_current_span("baz"):
trace.get_current_span().add_event("baz-event")
print("Hello world from OTel Python")
Single Service: Send data to an external endpoint
In general, we are going to want to actually do something with our trace data rather than just printing it into a console. As they point out in the OTel Getting Started documentation:
The previous example does emit information about all spans, but the output is a bit hard to read. In most cases, you can instead export this data to an application performance monitoring backend to be visualized and queried. It’s also common to aggregate span and trace information from multiple services into a single database, so that actions requiring multiple services can still all be visualized together. This concept of aggregating span and trace information is known as distributed tracing.
In this example, we have what is effectively a monorepo (no services talking to each other) and we are then going to send this data to a tracing tool (in this case, Lightstep). Before getting started, you’ll need a Lightstep account so you can get an Access Token. We’ll be running the following script whose biggest difference is the configure_opentelemetry
statement. You’ll also notice that we have swapped out the ConsoleSpanExporter
from the first example in favor of the OTLPSpanExporter
which lets us export to one of these different tools.
# otcollector.py
import ssl
from opentelemetry import trace
from opentelemetry.launcher import configure_opentelemetry
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter,
)
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
configure_opentelemetry(
service_name="service-123",
access_token="<INSERT LIGHTSTEP ACCESS TOKEN HERE>",
service_version="1.2.3",
span_exporter_endpoint="https://ingest.lightstep.com:443",
span_exporter_insecure=False,
)
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
with tracer.start_as_current_span("bar"):
with tracer.start_as_current_span("baz"):
print("Hello world from OpenTelemetry Python!")
The output of this script will simply be the print
statement "Hello world from OpenTelemetry Python!"
. However, unlike our previous example, the span data that we created got sent to Lightstep since we specified that we wanted the spans to be exported to an endpoint at "https://ingest.lightstep.com:443"
.
On the right hand side, we can see that there were no logs associated with the spans that were sent to Lightstep. Similar to above, if we wanted to do that, we would need to get the current span get_current_span()
and add event data to those spans via the add_event()
method.
with tracer.start_as_current_span("foo"):
trace.get_current_span().add_event("foo-event-1")
trace.get_current_span().add_event("foo-event-2")
with tracer.start_as_current_span("bar"):
trace.get_current_span().add_event("bar-event")
with tracer.start_as_current_span("baz"):
trace.get_current_span().add_event("baz-event")
print("Hello world from OTel Python")
In the screenshot below, we can now see that logs are getting added to our span data and can be seen in the Lightstep UI.
Auto-Instrumentation: Flask Example
This next example comes directly from the OpenTelemetry Getting Started guide. It is slightly more complex and highlights how we can use auto-instrumentation in order to track common actions like HTTP responses and requests rather than having to explicitly invoke the beginning of spans via the tracer.start_as_current_span()
method.
To get started, follow the directions listed in the Getting Started guide. What I’ll do here is [try to] explain what’s happening under the hood.
In short, we are creating a Flask application which will return hello
when we issue an HTTP request. On top of this simple application, we are also adding some auto-instrumentation that is written specifically for Flask.
# flask_example.py
# Import the Flask class
import flask
# Library to grab http requests
# Link: https://docs.python-requests.org/en/latest/user/quickstart/
import requests
# Core Tracing API
from opentelemetry import trace
# Track web requests in Flask applicaations
# Link: https://opentelemetry-python-contrib.readthedocs.io/en/latest/instrumentation/flask/flask.html
from opentelemetry.instrumentation.flask import FlaskInstrumentor
# Allows tracing HTTP requests made by requests library
# Link: https://opentelemetry-python-kinvolk.readthedocs.io/en/latest/instrumentation/requests/requests.html
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.sdk.trace import TracerProvider
# Span Processor
from opentelemetry.sdk.trace.export import (
BatchSpanProcessor,
ConsoleSpanExporter,
)
# New tracer instance
trace.set_tracer_provider(TracerProvider())
# BatchSpanProcessor is an implementation of SpanProcessor that batches
# ended spans and pushes them to the configured SpanExporter
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)
# Create an instance of the Flask class which becomes the WSGI applicaiton
app = flask.Flask(__name__)
FlaskInstrumentor().instrument_app(app)
# Enable instrumentation in the requests library
RequestsInstrumentor().instrument()
# Instantiate a new tracer which will support the creation of spans
tracer = trace.get_tracer(__name__)
# Use the route() decorator to tell Flask what URL will trigger the function
# The hello() function will return "hello" as an HTML string in the browser
@app.route("/")
*def* hello():
# Create a root span, set it as the current span in context
with tracer.start_as_current_span("example-request"):
requests.get("http://www.example.com")
return "hello"
app.run(debug=True, port=5000)
What's interesting about this is that with minimal manual instrumentation, we are able to start generating span data specifically designed for this Flask app.
127.0.0.1 - - [22/Nov/2021 13:27:21] "GET /favicon.ico HTTP/1.1" 404 -
{
"name": "HTTP GET",
"context": {
"trace_id": "0x93ce9ebdc8756d51df26b60b11134885",
"span_id": "0xfc5a8b18749d6f9c",
"trace_state": "[]"
},
"kind": "SpanKind.CLIENT",
"parent_id": "0xbd4e400495e3a1aa",
"start_time": "2021-11-22T19:27:20.824127Z",
"end_time": "2021-11-22T19:27:20.880317Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.url": "http://www.example.com",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.7.1",
"service.name": "unknown_service"
}
}
{
"name": "example-request",
"context": {
"trace_id": "0x93ce9ebdc8756d51df26b60b11134885",
"span_id": "0xbd4e400495e3a1aa",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x13a28e0306bfd292",
"start_time": "2021-11-22T19:27:20.823659Z",
"end_time": "2021-11-22T19:27:20.880488Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.7.1",
"service.name": "unknown_service"
}
}
{
"name": "/",
"context": {
"trace_id": "0x93ce9ebdc8756d51df26b60b11134885",
"span_id": "0x13a28e0306bfd292",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2021-11-22T19:27:20.820676Z",
"end_time": "2021-11-22T19:27:20.880787Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 5000,
"http.host": "localhost:5000",
"http.target": "/",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36",
"net.peer.port": 55503,
"http.flavor": "1.1",
"http.route": "/",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.7.1",
"service.name": "unknown_service"
}
}
{
"name": "HTTP GET",
"context": {
"trace_id": "0xeb0773f6717233cfefe3107c9d5667cf",
"span_id": "0xef6c8e3e4268b610",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2021-11-22T19:27:20.997505Z",
"end_time": "2021-11-22T19:27:21.002324Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 5000,
"http.host": "localhost:5000",
"http.target": "/favicon.ico",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36",
"net.peer.port": 55505,
"http.flavor": "1.1",
"http.status_code": 404
},
"events": [],
"links": [],
"resource": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.7.1",
"service.name": "unknown_service"
}
}
So there you have it! We worked our way from the simple to the complex and in doing so were able to see both manual and auto-instrumentation for OpenTelemetry.