Ingestion

Ingestion is the process of sending Telemetry signals (Metrics, Events, Logs, Traces) into Parseable.

You can ingest data to Parseable in JSON format over HTTP(s). This means that any log agent, metric collector, or tracing library that can send data over HTTP in JSON format can be used to ingest data into Parseable.

For log data you can use the HTTP output plugins of all the logging agents/shippers. You can also directly integrate Parseable with your application via REST API.

Parseable also supports the OTel native data ingestion via the Protobuf over HTTP. This allows you to use any OpenTelemetry compatible log agent or library to send logs to Parseable.

Field name starting with @ is replaced with _. In case of key collision, the event is rejected.

When an incoming event has a field with a data type different from the existing schema, the field is automatically renamed with a type suffix (e.g., body_timestamp_utf8, span_kind_int64). This prevents ingestion failures where field types may vary between events.

Ingestion reference

Explore the different methods to ingest data into Parseable. Choose the method that best fits your infrastructure and use case.

OpenTelemetry

In case of otel-metrics, there may be key collision in the attributes key, as the attributes are present in different hierarchical levels in the otel metrics event. In such scenarios, the key values are overridden by the last available attribute.

OTel Traces

Ingest distributed traces using OpenTelemetry Protocol (OTLP).

OTel Logs

Ingest structured logs using OpenTelemetry Protocol (OTLP).

OTel Metrics

Ingest metrics using OpenTelemetry Protocol (OTLP).

Zero instrumentation

eBPF Agents

Collect logs without modifying your application code.

LLM & AI Agents

OpenAI

Track and log OpenAI API calls and responses.

Anthropic

Monitor and log Anthropic Claude API interactions.

LangChain

Ingest traces and logs from LangChain applications.

LlamaIndex

Collect observability data from LlamaIndex applications.

AutoGen

Track multi-agent conversations and AutoGen workflows.

CrewAI

Ingest logs from CrewAI agent orchestration.

DSPy

Monitor DSPy framework executions and outputs.

n8n

Collect workflow execution logs from n8n automation platform.

Log agents and shippers

Fluent Bit

Lightweight and scalable logging processor. Perfect for Kubernetes and Docker environments.

Vector

High-performance observability data pipeline for logs, metrics, and traces.

Fluentd

Unified logging layer that collects data from multiple sources and routes to destinations.

OpenTelemetry Collector

Vendor-agnostic way to receive, process, and export telemetry data.

Apache Log4j

Send logs directly from Log4j applications to Parseable.

Logstash

Server-side data processing pipeline that ingests data from multiple sources.

Syslog

Standard protocol for message logging across network devices and servers.

Filebeat

Lightweight shipper for forwarding and centralizing log data from files.

Promtail

Agent that ships the contents of local logs to a centralized store.

Prometheus

Monitoring system and time series database for metrics collection.

Databases

PostgreSQL

Ingest logs and metrics from PostgreSQL databases.

MySQL

Collect logs and metrics from MySQL databases.

MongoDB

Ingest logs and metrics from MongoDB databases.

Redis

Collect logs and metrics from Redis databases.

Elasticsearch

Migrate or sync data from Elasticsearch to Parseable.

Containers

Docker

Collect logs from Docker containers using logging drivers.

Kubernetes

Ingest logs from Kubernetes clusters using DaemonSets and sidecars.

Amazon ECS

Collect logs from Amazon Elastic Container Service tasks.

Amazon EKS

Ingest logs from Amazon Elastic Kubernetes Service clusters.

Google GKE

Ingest logs from Google Kubernetes Engine clusters.

Azure AKS

Collect logs from Azure Kubernetes Service clusters.

Streaming Platforms

Kafka

Ingest real-time streaming data from Apache Kafka topics.

Redpanda

Stream data from Redpanda, a Kafka-compatible streaming platform.

RabbitMQ

Collect messages from RabbitMQ message broker.

NATS

Ingest messages from NATS messaging system.

CI/CD Tools

GitHub Actions

Collect logs and metrics from GitHub Actions workflows.

Jenkins

Collect build and deployment logs from Jenkins.

GitLab CI

Ingest logs from GitLab CI/CD pipelines.

CircleCI

Ingest logs from CircleCI build pipelines.

ArgoCD

Collect logs from ArgoCD GitOps deployments.

Terraform

Ingest infrastructure deployment logs from Terraform.

Cloud services

AWS CloudWatch

Ingest logs and metrics from AWS CloudWatch service.

AWS Kinesis

Stream data from AWS Kinesis Data Streams to Parseable.

Azure Event Hubs

Ingest streaming data from Azure Event Hubs.

GCP Pub/Sub

Stream data from Google Cloud Pub/Sub messaging service.

Security Tools

Falco

Ingest runtime security events from Falco.

Trivy

Collect vulnerability scan results from Trivy.

SIEM Export

Export security events to SIEM platforms.

Programming languages

Python

Integrate Parseable with Python applications using standard logging libraries.

JavaScript/Node.js

Integrate Parseable with Node.js applications and browser-based logging.

Go

Send logs from Go applications using HTTP client or OpenTelemetry SDK.

Java

Send logs from Java applications using Log4j, Logback, or OpenTelemetry.

Rust

Send logs from Rust applications using tracing or log crates.

C#

Integrate Parseable with C# applications using Serilog or NLog.

.NET

Integrate Parseable with .NET applications using structured logging.

PHP

Send logs from PHP applications using Monolog or custom implementations.

Ruby

Integrate Parseable with Ruby applications using standard logging frameworks.

Ingestion HTTP headers

You can use HTTP headers to control how data is ingested and processed.

Required headers

Header	Description	Example	Possible Values
`X-P-Stream`	Target dataset name. Creates the dataset if it doesn't exist.	`nginx-logs`	Valid dataset name
`Authorization`	Basic auth credentials (base64 encoded `username:password`)	`Basic YWRtaW46YWRtaW4=`	Valid credentials
`Content-Type`	Content type of the request body	`application/json`	`application/json`,`application/protobuf`

Optional headers

Header	Description	Example
`X-P-Tag-{field}`	Add custom tags/metadata to events. Replace `{field}` with tag name.	`X-P-Tag-environment: production`

Log processing

Pro Enterprise

When agents like Fluent Bit or Vector send logs to Parseable, they typically send the entire log line in a single field (usually named log or message). With log extraction enabled, Parseable can automatically parse this raw log line and extract structured fields using regex patterns.

The flow:

Agent sends JSON log data with the raw log line in a field (e.g., {"log": "192.168.1.1 - - [10/Jan/2026:12:00:00] ..."})
Agent sets X-P-Extract-Log header to the field name containing the raw log (e.g., log)
Agent sets X-P-Log-Source header to the log format name (e.g., nginx_access)
Parseable reads the value from the specified field, applies the matching regex pattern, and adds all extracted fields to the event

Headers for log extraction

Header	Description	Example	Possible Values
`X-P-Log-Source`	Log format name - Parseable applies regex patterns to extract fields	`nginx_access`, `syslog_log`	See Supported log source formats
`X-P-Extract-Log`	Field name in the JSON payload that contains the raw log line	`log`, `message`	Any valid field name

Supported log source formats

#	Format Name
1	`access_log`
2	`alb_log`
3	`block_log`
4	`candlepin_log`
5	`choose_repo_log`
6	`cloudvm_ram_log`
7	`cups_log`
8	`dpkg_log`
9	`elb_log`
10	`engine_log`
11	`env_logger_log`
12	`error_log`
13	`esx_syslog_log`
14	`haproxy_log`
15	`java`
16	`katello_log`
17	`klog`
18	`kubernetes_log`
19	`lnav_debug_log`
20	`nextflow_log`
21	`nginx_access`
22	`openam_log`
23	`openamdb_log`
24	`openstack_log`
25	`page_log`
26	`parseable_server_logs`
27	`postgres`
28	`postgresql_log`
29	`procstate_log`
30	`proxifier_log`
31	`rails_log`
32	`redis_log`
33	`s3_log`
34	`simple_rs_log`
35	`snaplogic_log`
36	`sssd_log`
37	`strace_log`
38	`sudo_log`
39	`syslog_log`
40	`tcf_log`
41	`tcsh_history`
42	`uwsgi_log`
43	`vmk_log`
44	`vmw_log`
45	`vmw_py_log`
46	`vmw_vc_svc_log`
47	`vpostgres_log`
48	`web_robot_log`
49	`xmlrpc_log`
50	`zookeeper`
51	`zookeeper_log`

Flattening

Nested JSON objects are automatically flattened. For example, the following JSON object

{
  "foo": {
    "bar": "baz"
  }
}

will be flattened to

{
  "foo_bar": "baz"
}

before it gets stored. While querying, this field should be referred to as foo_bar. For example, select foo_bar from <dataset-name>. The flattened field will be available in the schema as well.

Batching and Compression

Wherever applicable, we recommend enabling the log agent's compression and batching features to reduce network traffic and improve ingestion performance. The maximum payload size in Parseable is 10 MiB (10485760 Bytes). The payload can contain single log event as a JSON object or multiple log events in a JSON array. There is no limit to the number of batched events in a single call.

Timestamp

Correct time is critical to understand the proper sequence of events. Timestamps are important for debugging, analytics, and deriving transactions. We recommend that you include a timestamp in your log events formatted in RFC3339 format.

Parseable uses the event-received timestamp and adds it to the log event in the field p_timestamp. This ensures there is a time reference in the log event, even if the original event doesn't have a timestamp.

Staging

Staging in Parseable refers to the process of storing log data on locally attached storage before it is pushed to a long term and persistent store like S3 or something similar. Staging acts as a buffer for incoming events and allows a stable approach to pushing events to the persistent store.

Once an HTTP call is received on the Parseable server, events are parsed and converted to Arrow format in memory. This Arrow data is then written to the staging directory (defaults to $PWD/staging). Every minute, the server converts the Arrow data to Parquet format and pushes it to the persistent store. We chose a minute as the default interval, so there is a clear boundary between events, and the prefix structure on S3 is predictable.

The query flow in Parseable allows transparent access to the data in the staging directory. This means that the data in the staging directory is queryable in real-time. As a user, you won't see any difference in the data fetched from the staging directory or the persistent store.

The staging directory can be configured using the P_STAGING_DIR environment variable, as explained in the environment vars section.

Planning for Production

When planning for the production deployment of Parseable, the two most important considerations from a staging perspective are:

Storage size: Ensure that the staging area has sufficient capacity to handle the anticipated log volume. This prevents data loss due to disk space exhaustion. To calculate the storage size, consider the average log event size, the expected log volume for 5-10 minutes. This is done as under high loads, the conversion to Parquet and subsequent push to S3 may lag behind.

Local storage redundancy: Data in staging has not been committed to persistent store, it is important to have the staging itself reliable and redundant. This way, the staging data is protected from data loss due to simple disk failures. If using AWS, choose from services like EBS (Elastic Block Store) or EFS (Elastic File System), and mount these volumes on the Parseable server. Similarly, on Azure chose from Managed Disks or Azure Files. If you're using a private cloud, a reliable mounted volume from a NAS or SAN can be used.