When Your Kafka Consumer Goes Silent

It's 2 AM. Your Kafka consumer has stalled. Consumer lag is climbing into the millions. Customers are reporting delays. Your logs show nothing useful, and changing the log level requires a deployment approval that won't come until morning.

If you're operating Apache Kafka at scale, you know this scenario all too well. A consumer group stops progressing, and you're left playing detective with insufficient clues. The pressure builds as you manually dig through offsets, parse logs, and inspect messages, hoping to find the needle in the haystack before the business impact escalates.

This operational burden drains valuable engineering time that could be spent building features that matter.

Why Kafka Troubleshooting Takes So Long

The real challenge with Kafka operations isn't detecting that something broke—monitoring tools handle that well. The problem is answering the critical questions:

Why did it break?
When did it start?
What changed?
Who is impacted?

Consider the "poison pill" scenario as one of the most frustrating Kafka issues. A single malformed message brings an entire consumer group to its knees. These scenarios happen more often than you'd think:

Schema evolution issues: Data format changes that break consumers
Encoding problems: Character set mismatches or size violations
Business logic failures: Messages that pass schema validation but fail processing

Even with a schema registry and strict enforcement, you're not protected against semantic errors or application-level processing failures that occur after successful deserialization.

Traditional debugging requires manually hunting through logs, inspecting offsets, and examining messages one by one, a process that can take hours for topics with millions of messages.

Comprehensive Kafka Monitoring with Parseable

This is a two-part tutorial that shows you how to set up complete Kafka monitoring using Parseable and Kafka Exporter. You'll learn to:

Export Kafka metrics using Kafka Exporter to Parseable
Explore your data with simple filters and group-by operations

In Part 2 of this tutorial, we'll explore the following:

Query metrics using SQL for deep analysis
Build visualizations and dashboards for real-time monitoring
Set up intelligent alerts and investigate issues when they occur

By the end, you'll have a production-ready Kafka monitoring solution that reduces troubleshooting time from hours to minutes.

Prerequisites

Before we begin, ensure you have:

A running Kafka cluster (Apache Kafka, MSK, or Confluent Cloud)
An agent to collect metrics from Kafka Exporter and send them to Parseable
Parseable instance (local, cloud, or self-hosted)
Docker and Docker Compose installed
Basic familiarity with Kafka concepts (topics, partitions, consumer groups)

Step 1: Setting Up the Complete Kafka Monitoring Stack

We'll set up a complete Kafka environment with Kafka Exporter to expose metrics and Fluent Bit to collect and forward them to Parseable.

Architecture Overview

The monitoring pipeline consists of:

Kafka Cluster: Apache Kafka with KRaft mode (no Zookeeper needed)
Schema Registry: For schema management
Kafka Connect: For data integration (with Datagen connector for demo data)
Kafka Exporter: Exposes Kafka metrics in Prometheus format
Fluent Bit: Collects metrics from Kafka Exporter and sends to Parseable

Create the Docker Compose File

Create a docker-compose.yml file with the complete stack:

networks:
  kafka-network:
    driver: bridge

services:
  kafka:
    image: apache/kafka:3.8.0
    hostname: kafka
    container_name: kafka
    networks:
      - kafka-network
    ports:
      - "9092:9092"
      - "9093:9093"
      - "9999:9999"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:9093'
      KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:9093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
      KAFKA_JMX_PORT: 9999
      KAFKA_JMX_HOSTNAME: kafka
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'

  schema-registry:
    image: confluentinc/cp-schema-registry:7.6.0
    hostname: schema-registry
    container_name: schema-registry
    networks:
      - kafka-network
    depends_on:
      - kafka
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'kafka:29092'
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081

  kafka-connect:
    image: confluentinc/cp-kafka-connect:7.6.0
    hostname: kafka-connect
    container_name: kafka-connect
    networks:
      - kafka-network
    depends_on:
      - kafka
      - schema-registry
    ports:
      - "8083:8083"
    environment:
      CONNECT_BOOTSTRAP_SERVERS: 'kafka:29092'
      CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect
      CONNECT_GROUP_ID: compose-connect-group
      CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
      CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
      CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
      CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
      CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
      CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
      # Disable JMX for Connect to avoid port conflict
      KAFKA_JMX_OPTS: " "
    command:
      - bash
      - -c
      - |
        confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:latest
        /etc/confluent/docker/run

  kafka-exporter:
    image: danielqsj/kafka-exporter:v1.6.0
    hostname: kafka-exporter
    container_name: kafka-exporter
    networks:
      - kafka-network
    ports:
      - "9308:9308"
    command:
      - '--kafka.server=kafka:29092'
      - '--web.listen-address=:9308'
      - '--log.level=info'
    depends_on:
      - kafka

  fluent-bit:
    image: fluent/fluent-bit:3.0.7
    hostname: fluent-bit
    container_name: fluent-bit
    networks:
      - kafka-network
    ports:
      - "2020:2020"
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf:ro
      - ./parsers.conf:/fluent-bit/etc/parsers.conf:ro
    environment:
      - FLB_LOG_LEVEL=info
    depends_on:
      - kafka-exporter
    extra_hosts:
      - "host.docker.internal:host-gateway"

Configure Fluent Bit to Collect and Send Metrics

Create a fluent-bit.conf file to scrape metrics from Kafka Exporter and send them to Parseable:

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name              prometheus_scrape
    Host              kafka-exporter
    Port              9308
    Tag               kafka_metrics
    Scrape_Interval   30s

[OUTPUT]
    Name              opentelemetry
    Match             kafka_metrics
    Host              your-parseable-instance.com
    Port              443
    URI               /v1/metrics
    Format            json
    Header            Authorization Basic <your-base64-encoded-credentials>
    Header            X-P-Stream kafka_metrics
    tls               On
    tls.verify        On
    Retry_Limit       3

Configuration Breakdown:

INPUT: Scrapes Prometheus metrics from Kafka Exporter every 30 seconds
OUTPUT: Sends metrics to Parseable via HTTP
- Replace your-parseable-instance.com with your Parseable host
- Replace <your-base64-encoded-credentials> with base64-encoded username:password
- X-P-Stream header specifies the stream name in Parseable

Create Parsers Configuration

Create a parsers.conf file (required by Fluent Bit):

[PARSER]
    Name        json
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

Configure Parseable Credentials

To generate the base64-encoded credentials:

echo -n "username:password" | base64

Replace username and password with your Parseable credentials, then update the Authorization header in fluent-bit.conf.

Start the Complete Stack

Start all services:

docker-compose up -d

Verify all containers are running:

docker-compose ps

You should see all services in the "Up" state.

Verify Metrics Collection

1. Check Kafka Exporter is exposing metrics:

curl http://localhost:9308/metrics | grep kafka_consumergroup_lag

You should see metrics like:

kafka_consumergroup_lag
kafka_topic_partition_current_offset
kafka_topic_partition_under_replicated_partition

2. Check Fluent Bit is running:

docker logs fluent-bit

Look for successful scraping messages like:

[info] [input:prometheus_scrape:prometheus_scrape.0] scraping metrics from kafka-exporter:9308

3. Verify metrics in Parseable:

Log into your Parseable instance and check that the kafka_metrics stream exists and is receiving data.

Generate Sample Data (Optional)

To generate sample Kafka traffic for testing, create a Kafka Connect Datagen source:

curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
  "name": "datagen-orders",
  "config": {
    "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector",
    "kafka.topic": "orders",
    "quickstart": "orders",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",
    "max.interval": 1000,
    "iterations": -1,
    "tasks.max": "1"
  }
}'

This creates an orders topic with continuous sample data generation.

Create a Consumer Group for Testing

Create a simple consumer to generate consumer lag metrics:

# Start a console consumer
docker exec -it kafka kafka-console-consumer \
  --bootstrap-server kafka:29092 \
  --topic orders \
  --group order-processor \
  --from-beginning

Press Ctrl+C after a few seconds to stop the consumer. This creates a consumer group with lag that you can monitor.

Step 2: Exploring Kafka Metrics with the Explore Page

Now that metrics are flowing into Parseable, let's start with the Explore page for quick insights.

Navigate to the Explore Page

Log into your Parseable instance
Select the kafka_metrics dataset from the dropdown
Click on the Explore tab

Understanding the Kafka Metrics Schema

The kafka_metrics dataset contains the following key fields:

1.Metric Identification:

metric_name: Name of the metric (e.g., kafka_consumergroup_lag, kafka_topic_partition_current_offset)
metric_type: Type of metric (gauge, sum, histogram, summary)
metric_description: Human-readable description
metric_unit: Unit of measurement

2.Kafka Dimensions:

consumergroup: Consumer group name
topic: Kafka topic name
partition: Partition number

3.Metric Values:

data_point_value: The actual metric value (for gauge and sum types)
data_point_sum: Sum value (for histogram and summary types)
data_point_count: Count of observations

4.Timestamps:

p_timestamp: Parseable ingestion timestamp
time_unix_nano: Metric collection timestamp
start_time_unix_nano: Start time for cumulative metrics

Quick Filtering and Grouping

The Explore page lets you quickly filter and group metrics without writing SQL. Start with filters to narrow down the data, then use group-by to aggregate.

Example 1: View Kafka Partitons Replicas by Topic and Partition

Step 1 - Add Filters:

Click Add Filter
Field: metric_name
Operator: =
Value: kafka_topic_partition_replicas
Set time range to "Last 1 hour"

Step 2 - Group By:

In the Group By section, select topic, partition

Result: You'll see the maximum replicas for each partition, immediately showing which partitions are falling behind.

Example 2: Monitor Topic Partition Leader Is Preferred

Step 1 - Add Filters:

Filter 1: metric_name = kafka_topic_partition_leader_is_preferred
Filter 2: topic = orders (or your topic name)
Time range: "Last 1 hour"

Step 2 - Group By:

Group by: topic, partition
Aggregation: MAX(data_point_value)

Result: This shows the leader is preferred for each partition in your topic, helping you understand data distribution and partition activity.

Example 3: Identify Under-Replicated Partitions

Step 1 - Add Filters:

Filter 1: metric_name = kafka_topic_partition_under_replicated_partition
Filter 2: data_point_value > 0
Time range: "Last 15 minutes"

Step 2 - Group By:

Group by: topic, partition
Aggregation: MAX(data_point_value)

Result: This immediately highlights any partitions with replication issues that need attention.

Example 4: Monitor Specific Metric Types

Step 1 - Add Filters:

Filter 1: metric_type = gauge
Filter 2: metric_name CONTAINS kafka_topic
Time range: "Last 30 minutes"

Step 2 - Group By:

Group by: metric_name

Result: Shows all gauge-type metrics related to Kafka topics and their data point counts.

Understanding the Data

The Explore page is perfect for:

Quick health checks: Filter by metric name and view aggregated values
Identifying trends: Use time range filters to spot patterns
Filtering by dimensions: Narrow down to specific consumer groups, topics, or partitions
Aggregating metrics: Group by multiple dimensions to understand relationships

However, for more complex analysis and deepdives we need the SQL editor, dashboards and setting up alerts. The details of this is available in Part 2 of this tutorial.

Conclusion

Setting up comprehensive Kafka monitoring doesn't require complex tool sprawl or extensive configuration. With just Parseable and Kafka Exporter, you can quickly get meaningful insights into your Kafka cluster's health—partition status, replication health, and topic metrics, all in one place.

The Explore page makes it simple to filter and group your Kafka metrics without writing any code.

Start monitoring your Kafka clusters with Parseable today and transform your operational efficiency.

Proactive Kafka Monitoring with Parseable

When Your Kafka Consumer Goes Silent

Why Kafka Troubleshooting Takes So Long

Comprehensive Kafka Monitoring with Parseable

Prerequisites

Step 1: Setting Up the Complete Kafka Monitoring Stack

Architecture Overview

Create the Docker Compose File

Configure Fluent Bit to Collect and Send Metrics

Create Parsers Configuration

Configure Parseable Credentials

Start the Complete Stack

Verify Metrics Collection

Generate Sample Data (Optional)

Create a Consumer Group for Testing

Step 2: Exploring Kafka Metrics with the Explore Page

Navigate to the Explore Page

Understanding the Kafka Metrics Schema

Quick Filtering and Grouping

Understanding the Data

Conclusion

Table of Contents

Try out Parseable for free - no credit card required

Try out Parseable for free - no credit card required

Subscribe to our newsletter

Home

Pricing

Resources

Company

SFO

BLR

Proactive Kafka Monitoring with Parseable

Predictive Observability at Scale

Table of Contents

Try out Parseable for free - no credit card required

Try out Parseable for free - no credit card required

Subscribe to our newsletter

Home

Pricing

Resources

Company

SFO

BLR