LiteLLM
Send LiteLLM traces to Parseable for LLM observability
Send LiteLLM traces to Parseable through OpenTelemetry for complete LLM observability with SQL-queryable analytics.
Overview
LiteLLM is an open-source LLM gateway that provides a unified API for 100+ LLM providers. It acts as a proxy between your application and LLM APIs like OpenAI, Anthropic, Azure, Bedrock, and self-hosted models.
Integrate LiteLLM with Parseable to:
- Track All LLM Calls - Monitor requests across multiple providers
- Analyze Latency - Identify slow models and optimize routing
- Monitor Token Usage - Track consumption and estimate costs
- Debug Errors - Investigate failed requests with full context
Architecture
LiteLLM → OpenTelemetry Collector → Parseable
/v1/tracesPrerequisites
- Python 3.8+
- LiteLLM installed
- OpenTelemetry Collector
- Parseable instance
Step 1: Install Dependencies
pip install litellm opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlpStep 2: Configure OpenTelemetry Collector
Create otel-collector-config.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 100
exporters:
otlphttp:
endpoint: http://localhost:8000
headers:
Authorization: Basic YWRtaW46YWRtaW4=
X-P-Stream: litellm-traces
X-P-Log-Source: otel-traces
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp]| Setting | Description |
|---|---|
receivers.otlp | Accepts OTLP data on gRPC (4317) and HTTP (4318) |
processors.batch | Batches spans before export for efficiency |
exporters.otlphttp.endpoint | Parseable server URL |
X-P-Stream | Target dataset in Parseable |
X-P-Log-Source | Must be otel-traces for trace data |
Start the collector:
./otelcol --config ./otel-collector-config.yamlStep 3: Configure LiteLLM
Set environment variables:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/json"Enable the OTEL callback in your Python code:
import litellm
from litellm import completion
# Enable OpenTelemetry tracing
litellm.callbacks = ["otel"]
# Make an LLM call
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "What is observability?"}]
)
print(response.choices[0].message.content)Every LLM call now emits a trace to Parseable.
Step 4: LiteLLM Proxy (Optional)
If running LiteLLM as a proxy server, configure litellm_config.yaml:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-...
litellm_settings:
callbacks: ["otel"]
environment_variables:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4318"
OTEL_EXPORTER_OTLP_PROTOCOL: "http/json"Start the proxy:
litellm --config litellm_config.yamlTrace Schema
LiteLLM traces include rich metadata:
| Field | Description |
|---|---|
trace_id | Unique trace identifier |
span_id | Unique span identifier |
span_name | Operation name (e.g., litellm.completion) |
span_duration_ms | Span duration in milliseconds |
span_model | Model used (e.g., gpt-4, claude-3-opus) |
span_input_tokens | Input token count |
span_output_tokens | Output token count |
span_total_tokens | Total tokens used |
span_status_code | HTTP status or error code |
Example Queries
Average Latency by Model
SELECT
"span_model" AS model,
AVG("span_duration_ms") AS avg_latency_ms,
COUNT(*) AS call_count
FROM "litellm-traces"
WHERE "span_name" LIKE '%completion%'
GROUP BY model
ORDER BY avg_latency_ms DESC;Token Usage Over Time
SELECT
DATE_TRUNC('hour', p_timestamp) AS hour,
SUM("span_input_tokens") AS input_tokens,
SUM("span_output_tokens") AS output_tokens,
SUM("span_total_tokens") AS total_tokens
FROM "litellm-traces"
GROUP BY hour
ORDER BY hour;Slowest Requests
SELECT
trace_id,
"span_model",
"span_duration_ms",
"span_total_tokens",
p_timestamp
FROM "litellm-traces"
ORDER BY "span_duration_ms" DESC
LIMIT 20;Error Rate by Model
SELECT
"span_model" AS model,
COUNT(*) AS total_calls,
SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) AS errors,
ROUND(100.0 * SUM(CASE WHEN "span_status_code" >= 400 THEN 1 ELSE 0 END) / COUNT(*), 2) AS error_rate
FROM "litellm-traces"
GROUP BY model
ORDER BY error_rate DESC;Cost Tracking
Calculate costs with token counts:
SELECT
"span_model" AS model,
SUM("span_input_tokens") AS input_tokens,
SUM("span_output_tokens") AS output_tokens,
-- GPT-4 pricing: $0.03/1K input, $0.06/1K output
ROUND(SUM("span_input_tokens") * 0.00003 + SUM("span_output_tokens") * 0.00006, 2) AS estimated_cost_usd
FROM "litellm-traces"
WHERE "span_model" = 'gpt-4'
AND p_timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY model;Alerting
High Latency Alert
Create an alert in Parseable:
- Stream:
litellm-traces - Column:
span_duration_ms - Aggregation:
AVG - Threshold:
> 5000
Token Spike Detection
Use anomaly detection on:
- Stream:
litellm-traces - Column:
span_total_tokens - Aggregation:
SUM
Privacy: Redacting Prompts
LiteLLM can redact sensitive content from traces:
Redact all messages globally:
litellm.turn_off_message_logging = TrueRedact per-request:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Sensitive prompt"}],
metadata={
"mask_input": True,
"mask_output": True
}
)This keeps request metadata (model, tokens, latency) while hiding the actual content.
Resources
Was this page helpful?