Skip to main content

Query

In addition to a simple, easy to use filtering interface, Parseable also offers (PostgreSQL compatible) SQL query interface to query log data. Users can choose to use the filter interface directly, without having to deal with SQL at all. But for more complex queries and advanced users, Parseable provides a SQL query interface.

You can specify the query and the relevant time range for which you want this query to be run. The response is inclusive of both the start and end timestamps.

The filter interface is quite self explanatory, with options to filter by specific columns and values, and also to filter by time range. In this document, we'll cover more about the SQL API and its capabilities.

Check out the Query API in Postman.

How does it work?

After parsing and creating the execution plan for a query, Parseable query server uses the data manifest file to filter out the relevant Parquet files. The data manifest file is a JSON file that contains the specific column metadata for a whole day. The querier uses this file to filter the relevant Parquet files based on the query filters and the time range.

Only the relevant Parquet file paths are then added as data source to custom table provider. Datafusion then efficiently reads the files via GetRange S3 API, pulling only the very specific data needed for the query. This ensures that only the relevant data is read from the storage, reducing the query time and cost.

Supported functions

Parseable supports a wide range of SQL functions - Aggregate, Window and Scalar functions. Refer the Apache Datafusion documentation for the complete list of supported functions and their usage.

Query with regular expressions

This section provides examples of how to use regular expressions in Parseable queries.

  • Match regular expression (Case Sensitive)
SELECT * FROM frontend where message ~ 'failing' LIMIT 9000;
  • Match regular expression (Case Insensitive)
SELECT * FROM frontend where message ~* 'application' LIMIT 9000;
  • Does not match regular expression (Case Sensitive)
SELECT * FROM frontend where message !~ 'started' LIMIT 9000;
  • Does not match regular expression (Case Insensitive)
SELECT * FROM frontend where message !~* 'application' LIMIT 9000;
  • Matches the beginning of the string (Case Insensitive)
SELECT * FROM frontend where message ~* '^a' LIMIT 9000;
  • Matches the end of the string
SELECT * FROM frontend where message ~ 'failing$' LIMIT 9000;
  • Matches numeric type data
SELECT * FROM frontend where uuid ~ '[0-9]' LIMIT 9000;
  • Matches numeric type data (two digits)
SELECT * FROM frontend where uuid ~ '[0-9][0-9]' LIMIT 9000;
  • Postgres regexp_replace: REGEXP_REPLACE() function is used to replace every instance of the numeric type data with the symbol *. In below sample, we provided a flag g that searches for every instance of the specified pattern.
SELECT REGEXP_REPLACE(uuid,'[0-9]','*','g') FROM frontend LIMIT 9000;
  • Postgres regexp_match: When a Regex is run against a string, the REGEXP_MATCHES() function compares the two and returns the string that matches the pattern as a set.
SELECT REGEXP_MATCH(email,'@(.*)$')  FROM frontend where email is not null LIMIT 10;
  • Postgres regex numbers only: Use the REGEXP_REPLACE() function to extract only the numbers from a string in PostgreSQL
SELECT REGEXP_REPLACE(email,'\\D','','g') FROM frontend where email is not null LIMIT 10;
  • Postgres regex split: SPLIT_PART() function can split a string into many parts. To divide a string into several pieces, we must pass the String, the Delimiter, and the Field Number.
SELECT SPLIT_PART(email,'@',1) FROM frontend where email is not null LIMIT 10 -- return before @ from email;
SELECT SPLIT_PART(email,'@',2) FROM frontend where email is not null LIMIT 10 -- return after @ from email;
  • Postgres Regex Remove Special Characters: Using the REGEXP_REPLACE() function, all Special Characters from a supplied text can be eliminated
SELECT REGEXP_REPLACE(email, '[^\\w]+','','g') FROM frontend where email is not null LIMIT 10;
  • Functions and Operators in pattern matching: Like and other POSIX regular expressions are supported.
SELECT * FROM frontend where email LIKE '%test%' LIMIT 10;
SELECT * FROM frontend where email ~ '^test' LIMIT 10;

Case sensitivity

Log stream column names are case sensitive. For example, if you send a log event like

{
"foo": "bar",
"Foo": "bar"
}

Parseable will create two columns foo and Foo in the schema. So, while querying please refer to the fields as foo and Foo respectively. While querying, unquoted identifiers are converted to lowercase. To query column names with uppercase letters, they must be passed in double quotes. For example when sending query via the REST API, the following JSON payload will apply WHERE condition to the column Foo:

{
"query":"select * from stream where \"Foo\"=bar",
"startTime":"2023-02-14T00:00:00+00:00",
"endTime":"2023-02-15T23:59:00+00:00"
}

If you're querying Parseable via Grafana UI (via data source plugin), you can use the following query to query the column Foo:

SELECT * FROM stream WHERE "Foo" = 'bar'

Query analysis

In some cases, you may want to understand the query performance. To view the detailed query execution plan, use the EXPLAIN ANALYZE keyword in the query. For example, the following query will return the query execution plan and time taken per step.

{
"query": "EXPLAIN ANALYZE SELECT * FROM frontend LIMIT 100",
"startTime": "2023-03-07T05:28:10.428Z",
"endTime": "2023-03-08T05:28:10.428Z"
}

Get Updates from Parseable

Subscribe to keep up with latest news, updates and new features on Parseable