Design Choices

This document outlines our key design choices, ensuring durability, scalability, and efficiency for modern observability workloads. This page also covers the technical trade offs in Parseable.

If you have a specific use case or need a feature tailored to your observability needs, let us know at sales@parseable.com. We ship fast and most of such requests can be done in a matter of days.

Ingested data is staged on local disk upon successful return by Parseable API. Data is then asynchronously committed to object store like S3. This ensures low latency, high throughput ingestion. To ensure data durability, we recommend using a small, reliable storage (EFS, Azure Files, NFS or equivalent) attached to the ingesting nodes. This ensures that data is not lost in case of a node failure.

Atomic ingestion

Each ingestion batch received via API is concurrently appended to the same file within a one-minute window. When converted from Arrow to Parquet, entries are reordered to ensure the latest data appears first.

Efficient storage

Parseable stores heavily compressed Parquet format to one of the most cost efficient storage, i.e. object storage. This leads to significant cost savings, especially for large datasets.

Smart caching

Frequently accessed logs are cached in memory and NVMe SSDs on query nodes for faster access. The system prioritizes recent data, manages cache eviction automatically, and minimizes object store API calls using Parseable manifest files and Parquet footers.

Index on demand

By default data is stored in columnar Parquet files, allowing fast aggregations, filtering numerical columns and SQL queries. Parseable allows indexing specific chunks of data, on demand - to allow text search on log data as and when needed.

Stateless high availability

High availability (HA) is ensured through a distributed mode in which multiple ingestion and query servers operate independently.

Object storage first

There is no separate consensus layer, eliminating complex coordination and reducing operational overhead. Object storage manages all concurrency control.

SQL for querying

We chose SQL as the query language for Parseable because it is widely used and understood, making it easier for users to interact with the system. SQL allows users to filter, aggregate, and join data from multiple sources. SQL is also very well supported by modern LLMs to generate queries from plain text.

Design Choices

Highlights

Low latency writes

Atomic ingestion

Efficient storage

Smart caching

Index on demand

Stateless high availability

Object storage first

SQL for querying

Trade-offs

Staged writes

Occasional Cold Queries

Timed queries

On this page