Introduction
LiteLLM is an open-source LLM gateway that provides a unified API for 100+ LLM providers. It acts as a proxy between your application and LLM APIs like OpenAI, Anthropic, Azure, Bedrock, and self-hosted models, giving you a single endpoint with built-in load balancing, fallbacks, and spend tracking.
But with great abstraction comes a new challenge: visibility. When your application makes dozens of LLM calls across different providers, you need answers to questions like:
- Which model is slowest? Which is most expensive?
- What's the token usage breakdown per user or feature?
- Where are the bottlenecks in my LLM-powered workflows?
LiteLLM has built-in OpenTelemetry support, which means you can send traces to any OTLP-compatible backend. In this guide, we'll set up a pipeline to send LiteLLM traces to Parseable through the OpenTelemetry Collector, giving you SQL-queryable LLM observability.
Why Parseable for LLM Traces?
Traditional APM tools weren't built for LLM workloads. They'll show you spans and latencies, but they won't help you answer:
- What was the total token cost across all Claude calls yesterday?
- Which prompts are generating the longest responses?
- How does latency correlate with input token count?
Parseable stores traces as structured data in columnar Parquet format. This means you can:
- Query with SQL
- Correlate across signals
- Control costs
- Set up alerts
Architecture Overview
The pipeline:
LiteLLM → OpenTelemetry Collector → Parseable
- LiteLLM — Your application uses LiteLLM to call LLM APIs. LiteLLM's OTEL callback exports traces.
- OpenTelemetry Collector — Receives OTLP traces from LiteLLM and forwards them to Parseable.
- Parseable — Ingests traces at
/v1/traces, stores them in object storage, and exposes them via SQL.
Prerequisites
- Python 3.8+ with LiteLLM installed
- OpenTelemetry Collector installed (installation guide)
- Parseable running and accessible (installation guide)
Step 1: Install LiteLLM with OpenTelemetry
Install LiteLLM and the OpenTelemetry SDK:
pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
Step 2: Configure OpenTelemetry Collector
Create a configuration file otel-collector-config.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 100
exporters:
otlphttp:
endpoint: http://localhost:8000
headers:
Authorization: Basic YWRtaW46YWRtaW4=
X-P-Stream: litellm-traces
X-P-Log-Source: otel-traces
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp]
Configuration breakdown:
| Setting | Description |
|---|---|
receivers.otlp | Accepts OTLP data on gRPC (4317) and HTTP (4318) |
processors.batch | Batches spans before export for efficiency |
exporters.otlphttp.endpoint | Parseable server URL |
X-P-Stream | Target dataset in Parseable |
X-P-Log-Source | Set to otel-traces for trace data |
Authorization | Base64-encoded username:password |
Start the collector:
./otelcol --config ./otel-collector-config.yaml
Step 3: Configure LiteLLM to Export Traces
Set environment variables to point LiteLLM at the OpenTelemetry Collector:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/json"
Now enable the OTEL callback in your Python code:
import litellm
from litellm import completion
# Enable OpenTelemetry tracing
litellm.callbacks = ["otel"]
# Make an LLM call
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "What is observability?"}]
)
print(response.choices[0].message.content)
Every LLM call will now emit a trace to the OpenTelemetry Collector, which forwards it to Parseable.
Step 4: Using LiteLLM Proxy (Optional)
If you're running LiteLLM as a proxy server, configure it in litellm_config.yaml:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-...
litellm_settings:
callbacks: ["otel"]
environment_variables:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4318"
OTEL_EXPORTER_OTLP_PROTOCOL: "http/json"
Start the proxy:
litellm --config litellm_config.yaml
All requests through the proxy will now be traced.
Step 5: Query Traces in Parseable
Once traces are flowing, open Parseable UI and select the litellm-traces stream.
Trace Schema
LiteLLM traces include rich metadata:
| Field | Description |
|---|---|
trace_id | Unique trace identifier |
span_id | Unique span identifier |
span_name | Operation name (e.g., litellm.completion) |
span_duration_ms | Span duration in milliseconds |
span_model | Model used (e.g., gpt-4, claude-3-opus) |
span_input_tokens | Input token count |
span_output_tokens | Output token count |
span_total_tokens | Total tokens used |
span_status_code | HTTP status or error code |
p_timestamp | Parseable ingestion timestamp |
Example Queries
Average latency by model:
SELECT
"span_model" AS model,
AVG("span_duration_ms") AS avg_latency_ms,
COUNT(*) AS call_count
FROM "litellm-traces"
WHERE "span_name" LIKE '%completion%'
GROUP BY model
ORDER BY avg_latency_ms DESC;
Token usage over time:
SELECT
DATE_TRUNC('hour', p_timestamp) AS hour,
SUM("span_input_tokens") AS input_tokens,
SUM("span_output_tokens") AS output_tokens,
SUM("span_total_tokens") AS total_tokens
FROM "litellm-traces"
GROUP BY hour
ORDER BY hour;
Slowest requests:
SELECT
trace_id,
"span_model",
"span_duration_ms",
"span_total_tokens",
p_timestamp
FROM "litellm-traces"
ORDER BY "span_duration_ms" DESC
LIMIT 20;
Error rate by model:
SELECT
"span_model" AS model,
COUNT(*) AS total_calls,
SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) AS errors,
ROUND(100.0 * SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) / COUNT(*), 2) AS error_rate
FROM "litellm-traces"
GROUP BY model
ORDER BY error_rate DESC;
Step 6: Set Up Alerts
Parseable's form-based alerting lets you create alerts without writing queries.
High Latency Alert
- Navigate to Alerts → Create Alert
- Configure:
- Dataset:
litellm-traces - Monitor Field:
span_duration_ms - Aggregation:
AVG - Alert Type: Threshold
- Condition: Greater than
5000(5 seconds) - Evaluation Window: 5 minutes
- Dataset:
- Add a webhook destination (Slack, PagerDuty, etc.)
Token Spike Anomaly Detection
- Create a new alert
- Configure:
- Dataset:
litellm-traces - Monitor Field:
span_total_tokens - Aggregation:
SUM - Alert Type: Anomaly Detection
- Sensitivity: Medium
- Historical Window: 7 days
- Dataset:
This will alert you when token usage deviates significantly from historical patterns.
Privacy: Redacting Prompts and Responses
LiteLLM can redact sensitive content from traces:
Redact all messages globally:
litellm.turn_off_message_logging = True
Redact per-request:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Sensitive prompt"}],
metadata={
"mask_input": True,
"mask_output": True
}
)
This keeps request metadata (model, tokens, latency) while hiding the actual content.
Cost Tracking
With token counts in your traces, you can calculate costs:
SELECT
"span_model" AS model,
SUM("span_input_tokens") AS input_tokens,
SUM("span_output_tokens") AS output_tokens,
-- GPT-4 pricing as example: $0.03/1K input, $0.06/1K output
ROUND(SUM("span_input_tokens") * 0.00003 + SUM("span_output_tokens") * 0.00006, 2) AS estimated_cost_usd
FROM "litellm-traces"
WHERE "span_model" = 'gpt-4'
AND p_timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY model;
Adjust the pricing multipliers for your specific models.
Conclusion
With LiteLLM, OpenTelemetry Collector, and Parseable, you get:
- Unified LLM observability — All your LLM calls in one place, regardless of provider
- SQL-powered analysis — Query traces like a database, not a black box
- Cost-effective storage — Object storage pricing instead of per-GB observability fees
- Actionable alerts — Threshold, anomaly, and forecast-based alerting
The setup takes 15 minutes. The visibility lasts forever.

