Manual instrumentation guide

This page shows how to instrument an agent so the run is easy to inspect once it reaches Parseable. The goal is simple: when you open a run in the Agents page, you should be able to follow the agent from the first input to the final output without guessing what happened in between.

Agent observability in Parseable is trace-first. You do not need one pipeline for traces and another one for conversation logs. The agent run, model calls, tool executions, and important GenAI details belong to the same trace. Parseable receives those spans and span events, flattens them into queryable records, and uses them to build the Agent Runs view.

Parseable follows the OpenTelemetry GenAI semantic conventions. For message content, use the native GenAI fields such as gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, and gen_ai.client.inference.operation.details. Do not create custom per-message event names when a native field can represent the same data.

Instrumentation overview

For each agent run, capture these pieces. If these are present, the UI can show both the summary and the step-by-step run.

What	Where it goes	Why
Agent run	One `invoke_agent` span	Groups the full run under one trace
Model calls	One `chat {model}` span for each LLM request	Shows model, latency, token usage, and errors per call
Tool execution	One `execute_tool {tool.name}` span for each tool call	Shows which tool ran, what it received, how long it took, and whether it failed
Message and operation details	Native GenAI attributes and the `gen_ai.client.inference.operation.details` event	Keeps prompts, outputs, tool call requests, tool results, and system instructions tied to the exact span
Evaluation result	`gen_ai.evaluation.result` event	Helps explain whether the output looked useful, correct, passed, or failed

Manual instrumentation rules

Keep instrumentation small and predictable. Wrap the parts that explain how the agent moved forward:

Start one parent span for the full agent run.
Start one child span for every model request.
Start one child span for every tool execution.
Attach native GenAI message and operation details to the span that produced them.
Mark the active span as failed when a model call or tool execution fails.

Agent invocation

Start with one parent span for the full agent invocation. This is the span that ties the run together.

Span name: invoke_agent {agent.name} or invoke-agent
Span kind: CLIENT or INTERNAL, depending on how the agent is invoked
Useful span attributes:
- gen_ai.operation.name: invoke_agent
- gen_ai.agent.name
- gen_ai.agent.id when available
- gen_ai.agent.version when available
- gen_ai.provider.name
- gen_ai.request.model
- gen_ai.conversation.id when the run belongs to a session or thread

When the run completes, add the final usage and finish information if you have it:

gen_ai.usage.input_tokens
gen_ai.usage.output_tokens
gen_ai.response.finish_reasons

If the original user request is captured at the agent level, store it with gen_ai.input.messages on the invocation span. A user-facing input should use the native message role user. If you show agent-to-agent handoffs in your product as agent messages, keep the wire format native by recording the produced message as an assistant message in gen_ai.output.messages, and use gen_ai.agent.name or gen_ai.agent.id to identify which agent produced it.

import json
from opentelemetry import trace

_tracer = trace.get_tracer("my-agent", "1.0.0")


def message(role: str, text: str) -> dict:
    return {
        "role": role,
        "parts": [
            {
                "type": "text",
                "content": text,
            }
        ],
    }


def run_agent(model_name: str, provider: str, problem: str):
    with _tracer.start_as_current_span("invoke-agent", kind=trace.SpanKind.CLIENT) as span:
        span.set_attribute("gen_ai.operation.name", "invoke_agent")
        span.set_attribute("gen_ai.agent.name", "swe-agent")
        span.set_attribute("gen_ai.provider.name", provider)
        span.set_attribute("gen_ai.request.model", model_name)
        span.set_attribute(
            "gen_ai.input.messages",
            json.dumps([message("user", problem)]),
        )

        response = call_llm(
            model=model_name,
            provider=provider,
            messages=[{"role": "user", "content": problem}],
        )

        span.set_attribute("gen_ai.usage.input_tokens", response.input_tokens)
        span.set_attribute("gen_ai.usage.output_tokens", response.output_tokens)
        span.set_attribute("gen_ai.response.finish_reasons", json.dumps(["stop"]))
        span.set_status(trace.Status(trace.StatusCode.OK))

        return response

Chat calls

Create one child span for each LLM request. If the agent calls the model three times, you should see three chat spans inside the same run.

Span name: chat {model}
Span kind: CLIENT
Required or useful span attributes:
- gen_ai.operation.name: chat
- gen_ai.provider.name
- gen_ai.request.model
- gen_ai.request.temperature when used
- gen_ai.request.top_p when used
- gen_ai.request.max_tokens when used
- gen_ai.response.model
- gen_ai.response.id
- gen_ai.response.finish_reasons
- gen_ai.usage.input_tokens
- gen_ai.usage.output_tokens
- gen_ai.usage.reasoning.output_tokens when the provider reports reasoning tokens

For the full request and response, emit gen_ai.client.inference.operation.details on the chat span. This event can carry the chat history, model output, system instructions, tool definitions, and request settings in the native OpenTelemetry shape.

import json
from opentelemetry import trace


def to_otel_messages(messages: list[dict]) -> list[dict]:
    otel_messages = []
    for msg in messages:
        role = msg.get("role", "user")
        content = msg.get("content", "")
        otel_messages.append(message(role, content if isinstance(content, str) else json.dumps(content)))
    return otel_messages


def to_otel_output_messages(response) -> list[dict]:
    output_messages = []
    for choice in response.choices:
        parts = []

        if choice.message.content:
            parts.append(
                {
                    "type": "text",
                    "content": choice.message.content,
                }
            )

        for tool_call in getattr(choice.message, "tool_calls", []) or []:
            tool_call_dict = tool_call.model_dump() if hasattr(tool_call, "model_dump") else tool_call
            function = tool_call_dict.get("function", tool_call_dict)
            parts.append(
                {
                    "type": "tool_call",
                    "id": tool_call_dict.get("id", ""),
                    "name": function.get("name", ""),
                    "arguments": json.loads(function.get("arguments", "{}"))
                    if isinstance(function.get("arguments"), str)
                    else function.get("arguments", {}),
                }
            )

        output_messages.append(
            {
                "role": "assistant",
                "parts": parts,
                "finish_reason": choice.finish_reason or "",
            }
        )

    return output_messages


def call_llm(model: str, provider: str, messages: list[dict], temperature: float = 0.0, **kwargs):
    with _tracer.start_as_current_span(f"chat {model}", kind=trace.SpanKind.CLIENT) as span:
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", provider)
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.request.temperature", temperature)

        try:
            response = your_llm_client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                **kwargs,
            )
        except Exception as exc:
            span.set_attribute("error.type", type(exc).__name__)
            span.record_exception(exc)
            span.set_status(trace.Status(trace.StatusCode.ERROR, str(exc)))
            raise

        if response.model:
            span.set_attribute("gen_ai.response.model", response.model)
        if response.id:
            span.set_attribute("gen_ai.response.id", response.id)
        if response.usage:
            span.set_attribute("gen_ai.usage.input_tokens", response.usage.prompt_tokens)
            span.set_attribute("gen_ai.usage.output_tokens", response.usage.completion_tokens)

        output_messages = to_otel_output_messages(response)
        finish_reasons = [
            choice.finish_reason
            for choice in response.choices
            if choice.finish_reason
        ]

        if finish_reasons:
            span.set_attribute("gen_ai.response.finish_reasons", json.dumps(finish_reasons))

        span.add_event(
            "gen_ai.client.inference.operation.details",
            {
                "gen_ai.operation.name": "chat",
                "gen_ai.provider.name": provider,
                "gen_ai.request.model": model,
                "gen_ai.input.messages": json.dumps(to_otel_messages(messages)),
                "gen_ai.output.messages": json.dumps(output_messages),
            },
        )

        span.set_status(trace.Status(trace.StatusCode.OK))
        return response

Use gen_ai.system_instructions when the provider or framework separates system instructions from the chat history. If the system message is part of the normal chat history, include it inside gen_ai.input.messages instead.

Tool execution

Create one child span for each tool execution. This keeps the model request separate from the tool that actually ran.

Span name: execute_tool {tool.name}
Span kind: INTERNAL
Required or useful span attributes:
- gen_ai.operation.name: execute_tool
- gen_ai.tool.name
- gen_ai.tool.type
- gen_ai.tool.call.id when available
- gen_ai.tool.call.arguments when you capture tool input
- gen_ai.tool.call.result when you capture tool output

When the model provides a tool call ID, use the same gen_ai.tool.call.id on the chat span output message and on the tool execution span. That makes it easier to connect the model request with the actual tool execution.

from opentelemetry import trace


def execute_tool(tool_name: str, arguments: dict, tool_call_id: str | None = None):
    with _tracer.start_as_current_span(
        f"execute_tool {tool_name}",
        kind=trace.SpanKind.INTERNAL,
    ) as span:
        span.set_attribute("gen_ai.operation.name", "execute_tool")
        span.set_attribute("gen_ai.tool.name", tool_name)
        span.set_attribute("gen_ai.tool.type", "function")
        span.set_attribute("gen_ai.tool.call.arguments", json.dumps(arguments))

        if tool_call_id:
            span.set_attribute("gen_ai.tool.call.id", tool_call_id)

        try:
            result = run_tool(tool_name, arguments)
        except Exception as exc:
            span.set_attribute("error.type", type(exc).__name__)
            span.record_exception(exc)
            span.set_status(trace.Status(trace.StatusCode.ERROR, str(exc)))
            span.set_attribute(
                "gen_ai.tool.call.result",
                json.dumps({"error": str(exc)}),
            )
            raise

        span.set_attribute("gen_ai.tool.call.result", json.dumps(result))
        span.set_status(trace.Status(trace.StatusCode.OK))
        return result

If a later model call includes the tool result in its input, represent that result as a message with role tool inside gen_ai.input.messages. This keeps the model-visible conversation history compliant with the native GenAI message schema.

Events

Events are the detail layer of the trace. Spans tell you that an agent called a model or executed a tool. Events tell you what the model saw, what it returned, what tools were available, and how an answer was evaluated.

Use native OpenTelemetry GenAI events first:

Event	Use it for	Attach it to
`gen_ai.client.inference.operation.details`	Full model request and response details, including `gen_ai.input.messages`, `gen_ai.output.messages`, `gen_ai.system_instructions`, request settings, and tool definitions	`chat {model}` span
`gen_ai.evaluation.result`	Answer quality, correctness, relevance, or other evaluator output	Span being evaluated, usually `chat {model}` or `invoke_agent`

For evaluation events, include fields such as:

gen_ai.evaluation.name
gen_ai.evaluation.score.label
gen_ai.evaluation.score.value
gen_ai.evaluation.explanation

This helps the invocation view explain not only what the agent did, but also whether the answer looked useful after the run.

Reasoning content

Some providers expose reasoning blocks or thinking summaries. If the provider reports reasoning token counts, store them in gen_ai.usage.reasoning.output_tokens.

If the provider exposes reasoning text and your privacy policy allows content capture, keep it in the same native operation-details structure you use for the model output. Avoid inventing separate custom reasoning events unless you intentionally want data outside the OpenTelemetry GenAI semantic conventions.

Errors

When something fails, mark the active span as failed. This is what makes bad runs easy to spot in the Agent Runs view.

For chat failures:

mark the chat {model} span with error status
set error.type
use the error message as the status description when available

For tool failures:

mark the execute_tool {tool.name} span with error status
set error.type
put the failure result in gen_ai.tool.call.result if it is safe to capture

Field placement

As a rule of thumb, put fields you want to aggregate often on spans. Put full request and response details on the operation-details event.

That usually means:

spans hold model, provider, latency, token counts, status, and tool name
gen_ai.client.inference.operation.details holds input messages, output messages, system instructions, tool definitions, and request settings
tool spans hold tool arguments and results when content capture is allowed
evaluation events hold score, label, and explanation

This keeps the full run in one trace while still giving Parseable enough detail to show both the summary and the step-by-step interaction.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Your Agent Application                                │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │ opentelemetry-instrument CLI wrapper             │   │
│  │ (configures TracerProvider and OTLP exporter)    │   │
│  └──────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐    │
│  │ Your code (manual instrumentation):             │    │
│  │  - start_as_current_span(...) → spans           │    │
│  │  - span.add_event(...)         → span events    │    │
│  └─────────────────────────────────────────────────┘    │
│                    │                                    │
│                    │ OTLP (protobuf)                    │
│                    ▼                                    │
│  ┌─────────────────────────────────────────────────┐    │
│  │ BatchSpanProcessor                              │    │
│  │ (buffers, batches, exports periodically)        │    │
│  └─────────────────────────────────────────────────┘    │
└────────────────────│────────────────────────────────────┘
                     │ OTLP HTTP (:4318) or gRPC (:4317)
                     ▼
┌─────────────────────────────────────────────────────────┐
│  OTel Collector                                         │
│  - Receives traces via OTLP                             │
│  - Batches and exports to backend(s)                    │
└────────────────────│────────────────────────────────────┘
                     │ OTLP/HTTP (JSON)
                     ▼
┌─────────────────────────────────────────────────────────┐
│  Parseable                                              │
│  - receives trace spans and span events                 │
│  - flattens both into SQL-queryable records             │
│  - enriches token and duration fields at ingest         │
└─────────────────────────────────────────────────────────┘

What each layer does

Layer	Responsibility
`opentelemetry-instrument` CLI	Bootstraps the SDK, creates a `TracerProvider`, and configures the OTLP exporter.
Your code	Creates spans, span attributes, and GenAI events.
`BatchSpanProcessor`	Buffers spans in memory and exports them in batches via OTLP.
OTel Collector	Receives OTLP traces, applies processors such as batching, and exports them onward.
Parseable	Stores span rows and event rows as flattened trace records with GenAI enrichments such as `p_genai_tokens_total`, `p_genai_tokens_per_sec`, and `p_genai_duration_ms`.

Setup: packages, environment, launch

Packages

pip install opentelemetry-distro opentelemetry-exporter-otlp

Package	Purpose
`opentelemetry-distro`	Provides the `opentelemetry-instrument` CLI
`opentelemetry-exporter-otlp`	OTLP exporter for traces

Environment variables

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME=my-genai-agent
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

Variable	Purpose	Required
`OTEL_EXPORTER_OTLP_ENDPOINT`	Collector address. `4318` for HTTP, `4317` for gRPC.	Yes
`OTEL_EXPORTER_OTLP_PROTOCOL`	`http/protobuf` or `grpc`	Yes
`OTEL_SERVICE_NAME`	Service name that appears on every span	Yes
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`	Allows GenAI message content to be captured when your instrumentation supports it. Review privacy policy before enabling this in production.	No

Launch command

opentelemetry-instrument \
    --traces_exporter otlp \
    --logs_exporter none \
    --metrics_exporter none \
    python my_agent.py

The CLI reads OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_PROTOCOL to configure the exporter.

OTel Collector configuration

The collector sits between your application and Parseable. For this flow, one traces pipeline is enough.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 100

exporters:
  otlphttp/traces:
    endpoint: https://your-parseable-instance:8000
    encoding: json
    headers:
      Authorization: "Basic <credentials>"
      X-P-Stream: "genai-traces"
      X-P-Log-Source: "otel-traces"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/traces]

Key points:

one OTLP receiver is enough
one traces pipeline is enough for agent observability in this model
events are exported together with their parent spans
Parseable receives the trace payload and flattens both span and event data into queryable records

Direct to Parseable

For smaller setups, you can send traces directly from the application to Parseable with OTLP environment variables. This avoids running a separate collector process.

export OTEL_EXPORTER_OTLP_ENDPOINT=${PARSEABLE_URL}
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic ${PARSEABLE_AUTH},X-P-Stream=${PARSEABLE_DATASET_NAME},X-P-Log-Source=otel-traces,X-P-Dataset-Tag=agent-observability"
export OTEL_SERVICE_NAME=my-agent
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

Scenarios

Simple chat completion

What you should expect:

one chat span
gen_ai.input.messages for the input chat history
gen_ai.output.messages for the model response
optionally one gen_ai.client.inference.operation.details event for the full request and response

Multi-turn conversation

What you should expect:

one chat span for the provider request
previous conversation history in gen_ai.input.messages
returned model output in gen_ai.output.messages
gen_ai.conversation.id when you have a stable conversation or session ID

Tool calling

What you should expect:

one chat span
a tool call part in gen_ai.output.messages when the model asks for a tool
one execute_tool span per actual tool execution
gen_ai.tool.call.arguments and gen_ai.tool.call.result on the tool span when content capture is allowed
a later chat span whose gen_ai.input.messages includes the tool result with role tool

Evaluation results

If you evaluate the final response, attach the result as gen_ai.evaluation.result. Parent it to the span being evaluated when possible. For example, attach a relevance, correctness, or end-to-end success score to the final chat {model} span or the invoke_agent span.

Errors

If the call fails before a response is returned, you should still see:

the span exists
the span carries error.type
any input messages or operation details emitted before the failure remain visible in the trace

Troubleshooting

No traces appearing

check that the collector is running
check that the process is launched with opentelemetry-instrument
check that --traces_exporter otlp is set
check that OTEL_EXPORTER_OTLP_ENDPOINT points to the collector

Check	How	Expected
Three span types present	Query traces and inspect `gen_ai.operation.name`	`invoke_agent`, `chat`, `execute_tool`
Span hierarchy correct	Open a trace and compare parent-child relationships	`chat` and `execute_tool` spans sit under `invoke_agent`
Request fields present	Check `chat` span attributes	model, provider, request config present
Token fields present	Check successful `chat` spans	input and output token fields present
Tool fields present	Check `execute_tool` span attributes	tool name, tool type, optional tool call ID
Message detail present	Open the same trace and inspect the chat span or event rows	`gen_ai.input.messages`, `gen_ai.output.messages`, or operation details visible
Evaluation visible	Inspect final chat or agent span events	`gen_ai.evaluation.result` is present when emitted
Errors captured	Trigger a failed tool or model call	span has error status and `error.type`
No duplicate spans	Compare one user-visible step to the trace	one logical operation maps to one span

Manual instrumentation guide

Instrumentation overview

Manual instrumentation rules

Agent invocation

Chat calls

Tool execution

Events

Reasoning content

Errors

Field placement

Architecture

What each layer does

Setup: packages, environment, launch

Packages

Environment variables

Launch command

OTel Collector configuration

Direct to Parseable

Scenarios

Simple chat completion

Multi-turn conversation

Tool calling

Evaluation results

Errors

Troubleshooting

No traces appearing

Span exists but message detail is missing

Too many duplicate chat spans

Provider setup errors

Verification checklist

On this page