Parseable

Ingestion


Ingestion is the process of sending Telemetry signals (Metrics, Events, Logs, Traces) into Parseable.

You can ingest data to Parseable in JSON format over HTTP(s). This means that any log agent, metric collector, or tracing library that can send data over HTTP in JSON format can be used to ingest data into Parseable.

For log data you can use the HTTP output plugins of all the logging agents/shippers like Fluent Bit, Vector, syslog-ng, LogStash, among others to send log events to Parseable.

You can also directly integrate Parseable with your application via REST API.

Parseable also supports the OTel native data ingestion via the Protobuf over HTTP. This allows you to use any OpenTelemetry compatible log agent or library to send logs to Parseable.

Ingestion reference

Explore the different methods to ingest data into Parseable. Choose the method that best fits your infrastructure and use case.

OpenTelemetry

Zero instrumentation

LLM & AI Agents

Log agents and shippers

Databases

Containers

Streaming Platforms

CI/CD Tools

Cloud services

Security Tools

Programming languages

Ingestion HTTP headers

You can use HTTP headers to control how data is ingested and processed.

Required headers

HeaderDescriptionExamplePossible Values
X-P-StreamTarget dataset name. Creates the dataset if it doesn't exist.nginx-logsValid dataset name
AuthorizationBasic auth credentials (base64 encoded username:password)Basic YWRtaW46YWRtaW4=Valid credentials
Content-TypeContent type of the request bodyapplication/jsonapplication/json,application/protobuf

Optional headers

HeaderDescriptionExample
X-P-Tag-{field}Add custom tags/metadata to events. Replace {field} with tag name.X-P-Tag-environment: production

Log processing

When agents like Fluent Bit or Vector send logs to Parseable, they typically send the entire log line in a single field (usually named log or message). With log extraction enabled, Parseable can automatically parse this raw log line and extract structured fields using regex patterns.

The flow:

  1. Agent sends JSON log data with the raw log line in a field (e.g., {"log": "192.168.1.1 - - [10/Jan/2026:12:00:00] ..."})
  2. Agent sets X-P-Extract-Log header to the field name containing the raw log (e.g., log)
  3. Agent sets X-P-Log-Source header to the log format name (e.g., nginx_access)
  4. Parseable reads the value from the specified field, applies the matching regex pattern, and adds all extracted fields to the event

Headers for log extraction

HeaderDescriptionExamplePossible Values
X-P-Log-SourceLog format name - Parseable applies regex patterns to extract fieldsnginx_access, syslog_logSee Supported log source formats
X-P-Extract-LogField name in the JSON payload that contains the raw log linelog, messageAny valid field name

Supported log source formats

#Format Name
1access_log
2alb_log
3block_log
4candlepin_log
5choose_repo_log
6cloudvm_ram_log
7cups_log
8dpkg_log
9elb_log
10engine_log
11env_logger_log
12error_log
13esx_syslog_log
14haproxy_log
15java
16katello_log
17klog
18kubernetes_log
19lnav_debug_log
20nextflow_log
21nginx_access
22openam_log
23openamdb_log
24openstack_log
25page_log
26parseable_server_logs
27postgres
28postgresql_log
29procstate_log
30proxifier_log
31rails_log
32redis_log
33s3_log
34simple_rs_log
35snaplogic_log
36sssd_log
37strace_log
38sudo_log
39syslog_log
40tcf_log
41tcsh_history
42uwsgi_log
43vmk_log
44vmw_log
45vmw_py_log
46vmw_vc_svc_log
47vpostgres_log
48web_robot_log
49xmlrpc_log
50zookeeper
51zookeeper_log

Flattening

Nested JSON objects are automatically flattened. For example, the following JSON object

{
  "foo": {
    "bar": "baz"
  }
}

will be flattened to

{
  "foo_bar": "baz"
}

before it gets stored. While querying, this field should be referred to as foo_bar. For example, select foo_bar from <dataset-name>. The flattened field will be available in the schema as well.

Batching and Compression

Wherever applicable, we recommend enabling the log agent's compression and batching features to reduce network traffic and improve ingestion performance. The maximum payload size in Parseable is 10 MiB (10485760 Bytes). The payload can contain single log event as a JSON object or multiple log events in a JSON array. There is no limit to the number of batched events in a single call.

Timestamp

Correct time is critical to understand the proper sequence of events. Timestamps are important for debugging, analytics, and deriving transactions. We recommend that you include a timestamp in your log events formatted in RFC3339 format.

Parseable uses the event-received timestamp and adds it to the log event in the field p_timestamp. This ensures there is a time reference in the log event, even if the original event doesn't have a timestamp.

Staging

Staging in Parseable refers to the process of storing log data on locally attached storage before it is pushed to a long term and persistent store like S3 or something similar. Staging acts as a buffer for incoming events and allows a stable approach to pushing events to the persistent store.

Once an HTTP call is received on the Parseable server, events are parsed and converted to Arrow format in memory. This Arrow data is then written to the staging directory (defaults to $PWD/staging). Every minute, the server converts the Arrow data to Parquet format and pushes it to the persistent store. We chose a minute as the default interval, so there is a clear boundary between events, and the prefix structure on S3 is predictable.

The query flow in Parseable allows transparent access to the data in the staging directory. This means that the data in the staging directory is queryable in real-time. As a user, you won't see any difference in the data fetched from the staging directory or the persistent store.

The staging directory can be configured using the P_STAGING_DIR environment variable, as explained in the environment vars section.

Planning for Production

When planning for the production deployment of Parseable, the two most important considerations from a staging perspective are:

Storage size: Ensure that the staging area has sufficient capacity to handle the anticipated log volume. This prevents data loss due to disk space exhaustion. To calculate the storage size, consider the average log event size, the expected log volume for 5-10 minutes. This is done as under high loads, the conversion to Parquet and subsequent push to S3 may lag behind.

Local storage redundancy: Data in staging has not been committed to persistent store, it is important to have the staging itself reliable and redundant. This way, the staging data is protected from data loss due to simple disk failures. If using AWS, choose from services like EBS (Elastic Block Store) or EFS (Elastic File System), and mount these volumes on the Parseable server. Similarly, on Azure chose from Managed Disks or Azure Files. If you're using a private cloud, a reliable mounted volume from a NAS or SAN can be used.

Was this page helpful?

On this page