A year of building the future of observability

N
Nitish Tiwari, Debabrata Panigrahi
December 24, 2025
From our viral RAG vs MCP blog post to zero-shot foundation models on Hacker News frontpage, here's how Parseable is pioneering agentic observability and driving the no dashboard movement.
A year of building the future of observability

Introduction

2025 has been a transformative year in the world of AI and observability. The convergence of large language models, agentic workflows, and unified data lakes has opened up new possibilities for how we monitor, understand, and operate complex systems.

As a company, we believe that the future of observability isn't about more dashboards, more alerts. It is about efficient, scalable foundation systems that not only handle observability data but also empower AI agents to leverage telemetry seamlessly.

This year, we took significant steps toward that vision, and the response from the community has been nothing short of incredible. This post reflects our AI journey, the research that shaped our thinking, the features we shipped, and the philosophy that's driving us forward.

The year started with a question: RAG vs MCP?

In January this year, we published what would become one of our most-read blog posts: "Is MCP a better alternative to RAG for Observability?"

Read the full blog post

In January, MCP had just released and the AI community was buzzing with energy around MCP and its promise. Everyone was asking the same question: Should I build with RAG or adopt MCP?

RAG (Retrieval Augmented Generation) had been the mainstay for developers integrating data sources with LLMs. But as adoption grew, issues emerged: context window limitations, cost of LLM calls, and inconsistent data access patterns. There wasn't a portable, standardized framework for connecting data systems to models.

MCP changed that equation. In our blog post, we critically compared the two approaches and found that MCP was indeed a better approach for observability workloads.

In the post we demoed our MCP server exposing Parseable capabilities through Tools (get-schema, post-dashboard) and Prompts (generate-dashboard-object), connected to Claude Desktop as the client. Within minutes, users could spin up elaborate dashboards that would have taken hours to build manually.

The post resonated. It became one of our highest-read articles. It is now obvious that MCP ushered in a new era for AI-data integration, especially for observability use cases where data freshness, context size, and cost efficiency are paramount.

Parseable MCP server

While MCP was new, we decided to build our own MCP server early in the year to experiment with its capabilities and limitations. Check out our MCP server on GitHub

We saw MCP not just as a protocol, but as the beginning of a new paradigm, one where AI agents could directly interact with observability data without human intermediation. By building early, we learned what works (structured tool definitions, clear error handling) and what doesn't (overly verbose responses that bloat context windows).

This early investment paid dividends. When enterprises started asking about AI integration, we had battle-tested answers.

Zero shot prediction with foundational timeseries models

Later in the year, we published our research on "zero shot forecasting with foundation models for timeseries observability data". The premise was provocative: Can modern foundation models understand telemetry patterns without any fine-tuning?

Read our research

The motivation was deeply practical. Parseable handles observability data at massive scale, a nonstop stream of raw ingest counts, infrastructure vitals, and fine-grained application signals. Running a separate, hand-tuned forecasting model for every dataset quickly becomes a treadmill: each new dataset demands fresh hyperparameters, retrains, and ever-growing config sprawl.

What if you could just hand any telemetry stream to a pre-trained foundation model and immediately get a high-quality forecast, regardless of whether the model had seen data from that source before?

To answer this, we benchmarked four cutting-edge time-series foundation models:

  • Amazon Chronos: Universal forecaster with transformer architecture, trained on massive open datasets
  • Google TimesFM: The "GPT for time-series"—billions of parameters, attention-based architecture
  • IBM Tiny Time-Mixers: Ultra-lightweight models for edge and resource-constrained environments
  • Datadog Toto: Production-grade multivariate forecasting for correlated infrastructure metrics

We tested them on real observability tasks: predicting ingestion volumes and forecasting multiple pod-level metrics. The results showed that these foundation models demonstrated genuine promise for zero-shot forecasting, though with important trade-offs in computational cost and accuracy.

The post hit the Hacker News frontpage. The community debated evaluation metrics, suggested benchmarks like M4 Makridakis and GIFT-Eval, and shared their own experiments. The feedback was invaluable, pushing us toward more rigorous evaluation in our follow-up research.

What made this work special wasn't just the technical findings, but rather the philosophy behind it: we believe observability shouldn't require ML expertise. If foundation models can understand telemetry out of the box, we can democratize insights that were previously locked behind specialized tooling, making powerful analytics accessible to every engineer and empowering the entire community to build better, more reliable systems for everyone.

We've since published a follow-up comparing Chronos vs Toto in more depth, continuing our research into practical time-series foundation models for observability in production environments.

From research to product: what we shipped in 2025

Throughout 2025, we shipped a suite of AI features that embody our vision for agentic observability in production environments.

Keystone: observability without language barriers

As LLMs matured, we realized that the real power of AI in observability lies in natural language interaction. Engineers shouldn't have to learn complex query languages or navigate arcane UIs to get answers from their data.

However a simple chat based interface doesn't cut it. As the chat progresses, the users lose context of what happened in the previous steps. When debugging incidents, there is only so much context someone can fit in their head.

A better interface is a canvas where users ask questions, see results, then follow up as they drill down in separate nodes. This preserves the mental model of the investigation, while allowing the AI to maintain context across the conversation.

Keystone uses three specialized agents working in concert:

  1. Intent Agent: Understands what you're asking and extracts the relevant dataset and time range
  2. Query Agent: Generates optimized SQL within strict guardrails
  3. Visualization Agent: Presents results in the most useful format

We made extensive efforts to ensure a guard-railed execution. I mean we ensure that queries are validated before execution to prevent runaway costs and the answers are fact-checked against the data to avoid hallucinations.

When we demoed Keystone at KubeCon NA and SRECon EU, the reaction was overwhelmingly positive. Engineers who had spent years dealing with multiple query languages across multiple tools were suddenly getting answers in seconds. One attendee told us: "This is what I always imagined observability should be."

Explore Keystone in our docs

Timeseries models for forecasting

Based on our extensive work on timeseries models for zero shot forecasting, we released a built-in forecasting feature earlier in the year.

On a click of a button, users can now generate forecasts for any ingestion or metric dataset - the built-in model looks are the historical data across certain time windows and predicts future values with near real-time latency.

What makes this even more powerful is the ability to apply filters. You can forecast for specific teams, regions, or services. This enables targeted capacity planning that actually matches how organizations operate.

We designed the forecasting feature to be highly scalable. It can handle thousands of datasets in parallel, updating forecasts as new data arrives. This means users always have fresh predictions without manual intervention.

We also integrated anomaly detection and future trend alerts. So with a single click you can enable notifications when there is an anomalous behavior predicted in the forecast. You can also generate an alert if the forecasted value exceeds a certain threshold.

Explore Forecasting

Our forecasting engine analyzes historical ingestion patterns to predict:

  • Future data volumes: Plan capacity before you need it
  • Seasonal patterns: Understand cyclical behavior in your systems
  • Anomaly predictions: Get alerted when actual data deviates from forecasts

Dataset summarization

As users ingest Terabytes of data, on a daily basis, it becomes increasingly difficult to understand what is happening in all that data. Manually exploring datasets is time-consuming, error-prone and almost impossible at scale.

Summarization offers a deeper view into datasets by automatically extracting key insights, patterns, and anomalies. It acts as a briefing document for your data, highlighting what matters most - again just a click of a button.

Learn about Summarization

Our summarization feature generates concise overviews of any dataset:

  • Key patterns identified automatically
  • Anomalies highlighted with context
  • Actionable recommendations with suggested SQL queries
  • Root cause hints based on correlation analysis

Text to SQL: democratizing data access

Whether you're generating reports, building dashboards, debugging incidents, or exploring data - writing SQL queries sometimes proves to be a bottleneck.

LLMs on the other hand have become remarkably good at understanding a given schema and generating syntactically correct SQL queries from natural language descriptions.

Learn about Text-to-SQL

Our Text-to-SQL feature supports multiple LLM providers (OpenAI, Anthropic, and more) and includes:

  • Query generation from natural language descriptions
  • Auto-fix for failed queries: When a query errors, AI suggests corrections
  • Query library: Save and share queries across your team
  • Chat history: Build on previous conversations

The auto-fix capability deserves special mention. We've all been there: you write a query, it fails with a cryptic error, and you spend 20 minutes debugging syntax. Now, one click generates a working alternative.

Building towards a proactive, no-dashboard future

We envision a world where LLM enabled interaction with unified telemetry lakes eliminate the need for static dashboards altogether. Essentially, a proactive observability future where observability systems don't just wait for humans to ask questions—they anticipate needs and surface insights automatically.

Dashboard fatigue is real

We've written extensively about dashboard sprawl, the phenomenon where teams create endless dashboards in pursuit of insights, only to end up with poor visibility and fatigue. 86% of users we spoke with, reported feeling overwhelmed by the sheer number of alerts and dashboards they have to manage.

Dashboards assume:

  • Humans know what to look for in advance
  • Data can be meaningfully summarized in static views
  • The cost of creating & maintaining new views is acceptable

None of these assumptions hold in modern agentic and cloud-native workloads. Systems are too dynamic. Data volumes are too large. Engineers are too busy.

The proactive observability vision

At conferences throughout 2025, we've been championing what we call the "no-dashboard" movement. The idea is provocative but practical:

What if you never had to build another dashboard?

Instead of hunting through panels, you describe what you're looking for, and let LLMs run queries, analyse data, and surface what matters.

More than eliminating visualization, this is about inverting the relationship. Dashboards become outputs of AI analysis, not inputs to human investigation.

The response at KubeCon and other conferences has been remarkable. Engineers are tired of dashboard fatigue. They want a better way to understand their systems and make decisions quickly—without the overhead of maintaining countless dashboards.

Unified observability for faster resolution, fewer escalations, happier customers

Logs, metrics, traces, events — they all tell parts of the same story. Fragmenting them across tools creates artificial boundaries that slow down incident response.

Parseable is a purpose built telemetry datalake:

  • Cross signal correlation without data movement
  • Single query interface across all data types
  • Cost effective retention with 90% compression
  • AI reasoning across the complete picture

When Keystone answers a question, it can draw on logs, metrics, and traces simultaneously. This is the power of unified data.

Enabling the SRE with AI: what's next?

Looking forward, we see observability evolving from a tool category into an intelligent partner.

The Lake That Answers Back

In our "Ask the Lake" series, we explored what happens when observability systems can proactively surface insights:

  • "The error you're seeing began 12 minutes earlier in an upstream service."
  • "This latency spike correlates with a retry storm in the API gateway."
  • "If this trend continues, user impact will begin in 6-8 minutes."

These aren't hypothetical. They're the kinds of insights that become possible when telemetry is unified, fresh, and continuously interpreted by AI.

Observing the agentic workflows

As AI agents become more autonomous, understanding their economics becomes critical. We've been researching the hidden costs of agentic workflows —the "triple tax" of orchestration, execution, and processing costs.

Read our analysis on agentic tool costs

Watch this space as we build features to help teams monitor and optimize their AI agent usage, ensuring that the promise of automation doesn't come with unexpected bills.

Instrumenting coding agents

We're working with partners on how to bring observability to coding agents. Our work on coding agent instrumentation provides a framework for understanding:

  • Planning metrics: How agents reason about tasks
  • Memory saturation: When context windows overflow
  • Tool call patterns: Which tools agents use and how
  • Failure modes: Detecting "zombie" agents that burn resources without progress

Read about monitoring coding agents

A personal note

Building Parseable has been an incredible journey. This year, more than any other, has reinforced our belief that the future of observability lies at the intersection of AI and unified data.

We're grateful to our community of users, partners, and contributors who have joined us on this path. Your feedback, ideas, and enthusiasm fuel our mission every day.

As I look ahead to 2026, I'm super excited about the possibilities. LLMs will only get better from here, and the opportunity to rethink how we observe and operate systems is immense. Most of all, I'm excited to continue building with all of you—pioneering new ways to make observability smarter, more intuitive, and ultimately more human.

Happy holidays and here's to a transformative 2026!

Share:

Subscribe to our newsletter

Get the latest updates on Parseable features, best practices, and observability insights delivered to your inbox.

SFO

Parseable Inc.

584 Castro St, #2112

San Francisco, California

94114-2512

Phone: +1 (650) 444 6216

BLR

Cloudnatively Services Private Limited

JBR Tech Park

Whitefield, Bengaluru

560066

Phone: +91 9480931554