LiteLLM Trace Analysis with Parseable

D
Debabrata Panigrahi
January 12, 2026
How to send LiteLLM traces to Parseable using OpenTelemetry Collector for LLM observability, cost tracking, and performance analysis.
LiteLLM Trace Analysis with Parseable

Introduction

LiteLLM is an open-source LLM gateway that provides a unified API for 100+ LLM providers. It acts as a proxy between your application and LLM APIs like OpenAI, Anthropic, Azure, Bedrock, and self-hosted models, giving you a single endpoint with built-in load balancing, fallbacks, and spend tracking.

But with great abstraction comes a new challenge: visibility. When your application makes dozens of LLM calls across different providers, you need answers to questions like:

  • Which model is slowest? Which is most expensive?
  • What's the token usage breakdown per user or feature?
  • Where are the bottlenecks in my LLM-powered workflows?

LiteLLM has built-in OpenTelemetry support, which means you can send traces to any OTLP-compatible backend. In this guide, we'll set up a pipeline to send LiteLLM traces to Parseable through the OpenTelemetry Collector, giving you SQL-queryable LLM observability.

Why Parseable for LLM Traces?

Traditional APM tools weren't built for LLM workloads. They'll show you spans and latencies, but they won't help you answer:

  • What was the total token cost across all Claude calls yesterday?
  • Which prompts are generating the longest responses?
  • How does latency correlate with input token count?

Parseable stores traces as structured data in columnar Parquet format. This means you can:

  • Query with SQL
  • Correlate across signals
  • Control costs
  • Set up alerts

Architecture Overview

The pipeline:

LiteLLM → OpenTelemetry Collector → Parseable
  1. LiteLLM — Your application uses LiteLLM to call LLM APIs. LiteLLM's OTEL callback exports traces.
  2. OpenTelemetry Collector — Receives OTLP traces from LiteLLM and forwards them to Parseable.
  3. Parseable — Ingests traces at /v1/traces, stores them in object storage, and exposes them via SQL.

Prerequisites

Step 1: Install LiteLLM with OpenTelemetry

Install LiteLLM and the OpenTelemetry SDK:

pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

Step 2: Configure OpenTelemetry Collector

Create a configuration file otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 100

exporters:
  otlphttp:
    endpoint: http://localhost:8000
    headers:
      Authorization: Basic YWRtaW46YWRtaW4=
      X-P-Stream: litellm-traces
      X-P-Log-Source: otel-traces
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

Configuration breakdown:

SettingDescription
receivers.otlpAccepts OTLP data on gRPC (4317) and HTTP (4318)
processors.batchBatches spans before export for efficiency
exporters.otlphttp.endpointParseable server URL
X-P-StreamTarget dataset in Parseable
X-P-Log-SourceSet to otel-traces for trace data
AuthorizationBase64-encoded username:password

Start the collector:

./otelcol --config ./otel-collector-config.yaml

Step 3: Configure LiteLLM to Export Traces

Set environment variables to point LiteLLM at the OpenTelemetry Collector:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/json"

Now enable the OTEL callback in your Python code:

import litellm
from litellm import completion

# Enable OpenTelemetry tracing
litellm.callbacks = ["otel"]

# Make an LLM call
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is observability?"}]
)

print(response.choices[0].message.content)

Every LLM call will now emit a trace to the OpenTelemetry Collector, which forwards it to Parseable.

Step 4: Using LiteLLM Proxy (Optional)

If you're running LiteLLM as a proxy server, configure it in litellm_config.yaml:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: sk-...

litellm_settings:
  callbacks: ["otel"]

environment_variables:
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4318"
  OTEL_EXPORTER_OTLP_PROTOCOL: "http/json"

Start the proxy:

litellm --config litellm_config.yaml

All requests through the proxy will now be traced.

Step 5: Query Traces in Parseable

Once traces are flowing, open Parseable UI and select the litellm-traces stream.

Trace Schema

LiteLLM traces include rich metadata:

FieldDescription
trace_idUnique trace identifier
span_idUnique span identifier
span_nameOperation name (e.g., litellm.completion)
span_duration_msSpan duration in milliseconds
span_modelModel used (e.g., gpt-4, claude-3-opus)
span_input_tokensInput token count
span_output_tokensOutput token count
span_total_tokensTotal tokens used
span_status_codeHTTP status or error code
p_timestampParseable ingestion timestamp

Example Queries

Average latency by model:

SELECT
  "span_model" AS model,
  AVG("span_duration_ms") AS avg_latency_ms,
  COUNT(*) AS call_count
FROM "litellm-traces"
WHERE "span_name" LIKE '%completion%'
GROUP BY model
ORDER BY avg_latency_ms DESC;

Token usage over time:

SELECT
  DATE_TRUNC('hour', p_timestamp) AS hour,
  SUM("span_input_tokens") AS input_tokens,
  SUM("span_output_tokens") AS output_tokens,
  SUM("span_total_tokens") AS total_tokens
FROM "litellm-traces"
GROUP BY hour
ORDER BY hour;

Slowest requests:

SELECT
  trace_id,
  "span_model",
  "span_duration_ms",
  "span_total_tokens",
  p_timestamp
FROM "litellm-traces"
ORDER BY "span_duration_ms" DESC
LIMIT 20;

Error rate by model:

SELECT
  "span_model" AS model,
  COUNT(*) AS total_calls,
  SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) AS errors,
  ROUND(100.0 * SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) / COUNT(*), 2) AS error_rate
FROM "litellm-traces"
GROUP BY model
ORDER BY error_rate DESC;

Step 6: Set Up Alerts

Parseable's form-based alerting lets you create alerts without writing queries.

High Latency Alert

  1. Navigate to Alerts → Create Alert
  2. Configure:
    • Dataset: litellm-traces
    • Monitor Field: span_duration_ms
    • Aggregation: AVG
    • Alert Type: Threshold
    • Condition: Greater than 5000 (5 seconds)
    • Evaluation Window: 5 minutes
  3. Add a webhook destination (Slack, PagerDuty, etc.)

Token Spike Anomaly Detection

  1. Create a new alert
  2. Configure:
    • Dataset: litellm-traces
    • Monitor Field: span_total_tokens
    • Aggregation: SUM
    • Alert Type: Anomaly Detection
    • Sensitivity: Medium
    • Historical Window: 7 days

This will alert you when token usage deviates significantly from historical patterns.

Privacy: Redacting Prompts and Responses

LiteLLM can redact sensitive content from traces:

Redact all messages globally:

litellm.turn_off_message_logging = True

Redact per-request:

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Sensitive prompt"}],
    metadata={
        "mask_input": True,
        "mask_output": True
    }
)

This keeps request metadata (model, tokens, latency) while hiding the actual content.

Cost Tracking

With token counts in your traces, you can calculate costs:

SELECT
  "span_model" AS model,
  SUM("span_input_tokens") AS input_tokens,
  SUM("span_output_tokens") AS output_tokens,
  -- GPT-4 pricing as example: $0.03/1K input, $0.06/1K output
  ROUND(SUM("span_input_tokens") * 0.00003 + SUM("span_output_tokens") * 0.00006, 2) AS estimated_cost_usd
FROM "litellm-traces"
WHERE "span_model" = 'gpt-4'
  AND p_timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY model;

Adjust the pricing multipliers for your specific models.

Conclusion

With LiteLLM, OpenTelemetry Collector, and Parseable, you get:

  • Unified LLM observability — All your LLM calls in one place, regardless of provider
  • SQL-powered analysis — Query traces like a database, not a black box
  • Cost-effective storage — Object storage pricing instead of per-GB observability fees
  • Actionable alerts — Threshold, anomaly, and forecast-based alerting

The setup takes 15 minutes. The visibility lasts forever.

Share:

Subscribe to our newsletter

Get the latest updates on Parseable features, best practices, and observability insights delivered to your inbox.

SFO

Parseable Inc.

584 Castro St, #2112

San Francisco, California

94114-2512

Phone: +1 (650) 444 6216

BLR

Cloudnatively Services Private Limited

JBR Tech Park

Whitefield, Bengaluru

560066

Phone: +91 9480931554