Design Choices
This document outlines our key design choices, ensuring durability, scalability, and efficiency for modern observability workloads. This page also covers the technical trade offs in Parseable.
If you have a specific use case or need a feature tailored to your observability needs, let us know at sales@parseable.com. We ship fast and most of such requests can be done in a matter of days.
Highlights
Low latency writes
Ingested data is staged on local disk upon successful return by Parseable API. Data is then asynchronously committed to object store like S3. This ensures low latency, high throughput ingestion. To ensure data durability, we recommend using a small, reliable storage (EFS, Azure Files, NFS or equivalent) attached to the ingesting nodes. This ensures that data is not lost in case of a node failure.
Atomic ingestion
Each ingestion batch received via API is concurrently appended to the same file within a one-minute window. When converted from Arrow to Parquet, entries are reordered to ensure the latest data appears first.
Efficient storage
Parseable stores heavily compressed Parquet format to one of the most cost efficient storage, i.e. object storage. This leads to significant cost savings, especially for large datasets.
Smart caching
Frequently accessed logs are cached in memory and NVMe SSDs on query nodes for faster access. The system prioritizes recent data, manages cache eviction automatically, and minimizes object store API calls using Parseable manifest files and Parquet footers.
Index on demand
By default data is stored in columnar Parquet files, allowing fast aggregations, filtering numerical columns and SQL queries. Parseable allows indexing specific chunks of data, on demand - to allow text search on log data as and when needed.
Stateless high availability
High availability (HA) is ensured through a distributed mode in which multiple ingestion and query servers operate independently.
Object storage first
There is no separate consensus layer, eliminating complex coordination and reducing operational overhead. Object storage manages all concurrency control.
SQL for querying
We chose SQL as the query language for Parseable because it is widely used and understood, making it easier for users to interact with the system. SQL allows users to filter, aggregate, and join data from multiple sources. SQL is also very well supported by modern LLMs to generate queries from plain text.
Trade-offs
Staged writes
Staging data locally on the ingestor node for at least a minute, leads to a minor lag in querying the data. We trades immediate persistence for low latency ingestion.
Occasional Cold Queries
The query layer fetches indexes from object storage (e.g., S3) and uses intelligent caching to accelerate future access. During the initial cache warm-up, some queries may access data directly from cold storage, resulting in higher latencies.
Timed queries
A query call requires start and end timestamp. This ensures data is queried across a fixed, definite set of files. Parseable ensures query response includes the staging and committed data on object storage as required.