Introduction

LiteLLM is an open-source LLM gateway that provides a unified API for 100+ LLM providers. It acts as a proxy between your application and LLM APIs like OpenAI, Anthropic, Azure, Bedrock, and self-hosted models, giving you a single endpoint with built-in load balancing, fallbacks, and spend tracking.

But with great abstraction comes a new challenge: visibility. When your application makes dozens of LLM calls across different providers, you need answers to questions like:

Which model is slowest? Which is most expensive?
What's the token usage breakdown per user or feature?
Where are the bottlenecks in my LLM-powered workflows?

LiteLLM has built-in OpenTelemetry support, which means you can send traces to any OTLP-compatible backend. In this guide, we'll set up a pipeline to send LiteLLM traces to Parseable through the OpenTelemetry Collector, giving you SQL-queryable LLM observability.

Why Parseable for LLM Traces?

Traditional APM tools weren't built for LLM workloads. They'll show you spans and latencies, but they won't help you answer:

What was the total token cost across all Claude calls yesterday?
Which prompts are generating the longest responses?
How does latency correlate with input token count?

Parseable stores traces as structured data in columnar Parquet format. This means you can:

Query with SQL
Correlate across signals
Control costs
Set up alerts

Architecture Overview

The pipeline:

LiteLLM → OpenTelemetry Collector → Parseable

LiteLLM — Your application uses LiteLLM to call LLM APIs. LiteLLM's OTEL callback exports traces.
OpenTelemetry Collector — Receives OTLP traces from LiteLLM and forwards them to Parseable.
Parseable — Ingests traces at /v1/traces, stores them in object storage, and exposes them via SQL.

Prerequisites

Python 3.8+ with LiteLLM installed
OpenTelemetry Collector installed (installation guide)
Parseable running and accessible (installation guide)

Step 1: Install LiteLLM with OpenTelemetry

Install LiteLLM and the OpenTelemetry SDK:

pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

Step 2: Configure OpenTelemetry Collector

Create a configuration file otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
 
processors:
  batch:
    timeout: 5s
    send_batch_size: 100
 
exporters:
  otlphttp:
    endpoint: http://localhost:8000
    headers:
      Authorization: Basic YWRtaW46YWRtaW4=
      X-P-Stream: litellm-traces
      X-P-Log-Source: otel-traces
    tls:
      insecure: true
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

Configuration breakdown:

Setting	Description
`receivers.otlp`	Accepts OTLP data on gRPC (4317) and HTTP (4318)
`processors.batch`	Batches spans before export for efficiency
`exporters.otlphttp.endpoint`	Parseable server URL
`X-P-Stream`	Target dataset in Parseable
`X-P-Log-Source`	Set to `otel-traces` for trace data
`Authorization`	Base64-encoded `username:password`

Start the collector:

./otelcol --config ./otel-collector-config.yaml

Step 3: Configure LiteLLM to Export Traces

Set environment variables to point LiteLLM at the OpenTelemetry Collector:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/json"

Now enable the OTEL callback in your Python code:

import litellm
from litellm import completion
 
# Enable OpenTelemetry tracing
litellm.callbacks = ["otel"]
 
# Make an LLM call
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is observability?"}]
)
 
print(response.choices[0].message.content)

Every LLM call will now emit a trace to the OpenTelemetry Collector, which forwards it to Parseable.

Step 4: Using LiteLLM Proxy (Optional)

If you're running LiteLLM as a proxy server, configure it in litellm_config.yaml:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: sk-...
 
litellm_settings:
  callbacks: ["otel"]
 
environment_variables:
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4318"
  OTEL_EXPORTER_OTLP_PROTOCOL: "http/json"

Start the proxy:

litellm --config litellm_config.yaml

All requests through the proxy will now be traced.

Step 5: Query Traces in Parseable

Once traces are flowing, open Parseable UI and select the litellm-traces stream.

Trace Schema

LiteLLM traces include rich metadata:

Field	Description
`trace_id`	Unique trace identifier
`span_id`	Unique span identifier
`span_name`	Operation name (e.g., `litellm.completion`)
`span_duration_ms`	Span duration in milliseconds
`span_model`	Model used (e.g., `gpt-4`, `claude-3-opus`)
`span_input_tokens`	Input token count
`span_output_tokens`	Output token count
`span_total_tokens`	Total tokens used
`span_status_code`	HTTP status or error code
`p_timestamp`	Parseable ingestion timestamp

Example Queries

Average latency by model:

SELECT
  "span_model" AS model,
  AVG("span_duration_ms") AS avg_latency_ms,
  COUNT(*) AS call_count
FROM "litellm-traces"
WHERE "span_name" LIKE '%completion%'
GROUP BY model
ORDER BY avg_latency_ms DESC;

Token usage over time:

SELECT
  DATE_TRUNC('hour', p_timestamp) AS hour,
  SUM("span_input_tokens") AS input_tokens,
  SUM("span_output_tokens") AS output_tokens,
  SUM("span_total_tokens") AS total_tokens
FROM "litellm-traces"
GROUP BY hour
ORDER BY hour;

Slowest requests:

SELECT
  trace_id,
  "span_model",
  "span_duration_ms",
  "span_total_tokens",
  p_timestamp
FROM "litellm-traces"
ORDER BY "span_duration_ms" DESC
LIMIT 20;

Error rate by model:

SELECT
  "span_model" AS model,
  COUNT(*) AS total_calls,
  SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) AS errors,
  ROUND(100.0 * SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) / COUNT(*), 2) AS error_rate
FROM "litellm-traces"
GROUP BY model
ORDER BY error_rate DESC;

Step 6: Set Up Alerts

Parseable's form-based alerting lets you create alerts without writing queries.

High Latency Alert

Navigate to Alerts → Create Alert
Configure:
- Dataset: litellm-traces
- Monitor Field: span_duration_ms
- Aggregation: AVG
- Alert Type: Threshold
- Condition: Greater than 5000 (5 seconds)
- Evaluation Window: 5 minutes
Add a webhook destination (Slack, PagerDuty, etc.)

Token Spike Anomaly Detection

Create a new alert
Configure:
- Dataset: litellm-traces
- Monitor Field: span_total_tokens
- Aggregation: SUM
- Alert Type: Anomaly Detection
- Sensitivity: Medium
- Historical Window: 7 days

This will alert you when token usage deviates significantly from historical patterns.

Privacy: Redacting Prompts and Responses

LiteLLM can redact sensitive content from traces:

Redact all messages globally:

litellm.turn_off_message_logging = True

Redact per-request:

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Sensitive prompt"}],
    metadata={
        "mask_input": True,
        "mask_output": True
    }
)

This keeps request metadata (model, tokens, latency) while hiding the actual content.

Cost Tracking

With token counts in your traces, you can calculate costs:

SELECT
  "span_model" AS model,
  SUM("span_input_tokens") AS input_tokens,
  SUM("span_output_tokens") AS output_tokens,
  -- GPT-4 pricing as example: $0.03/1K input, $0.06/1K output
  ROUND(SUM("span_input_tokens") * 0.00003 + SUM("span_output_tokens") * 0.00006, 2) AS estimated_cost_usd
FROM "litellm-traces"
WHERE "span_model" = 'gpt-4'
  AND p_timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY model;

Adjust the pricing multipliers for your specific models.

Conclusion

With LiteLLM, OpenTelemetry Collector, and Parseable, you get:

Unified LLM observability — All your LLM calls in one place, regardless of provider
SQL-powered analysis — Query traces like a database, not a black box
Cost-effective storage — Object storage pricing instead of per-GB observability fees
Actionable alerts — Threshold, anomaly, and forecast-based alerting

The setup takes 15 minutes. The visibility lasts forever.

LiteLLM Trace Analysis with Parseable