Kafka Streams: Real-Time Data Processing Made Simple

Apache Kafka has become the backbone of modern data infrastructure. But while Kafka handles the heavy lifting of storing and transmitting events, Kafka Streams is what makes processing those events actually enjoyable.

What is Kafka Streams?

Kafka Streams is a client library for building real-time streaming applications. Unlike Spark Streaming or Flink, it doesn't require a separate cluster—your application runs as a regular Java process.

> Key insight: Kafka Streams is an abstraction over producers and consumers that lets you focus on what you want to do, not how to do it.

The Problem It Solves

Imagine you're processing sensor data from a production line. Each sensor sends readings like this:

json

12345678

{
  "reading_ts": "2026-02-05T12:19:27Z",
  "sensor_id": "aa-101",
  "production_line": "w01",
  "widget_type": "acme94",
  "temp_celsius": 23,
  "widget_weight_g": 100
}

You need to filter readings where temperature exceeds a threshold. With vanilla Kafka, you'd write something like this:

java

12345678910111213141516171819202122232425

public static void main(String[] args) {
  try(Consumer<String, Reading> consumer = 
        new KafkaConsumer<>(consumerProps());
      Producer<String, Reading> producer = 
        new KafkaProducer<>(producerProps())) {
    
    consumer.subscribe(List.of("sensor-readings"));
    
    while (true) {
      ConsumerRecords<String, Reading> records = 
        consumer.poll(Duration.ofSeconds(5));
        
      for (ConsumerRecord<String, Reading> record : records) {
        Reading reading = record.value();
        
        if (reading.getTempCelsius() > 30) {
          ProducerRecord<String, Reading> alert = 
            new ProducerRecord<>("temp-alerts", 
              record.key(), reading);
          producer.send(alert);
        }
      }
    }
  }
}

That's a lot of boilerplate. Now here's the same logic with Kafka Streams:

java

1234567

final StreamsBuilder builder = new StreamsBuilder();

builder.stream("sensor-readings", 
    Consumed.with(Serdes.String(), readingSerde))
  .filter((key, reading) -> reading.getTempCelsius() > 30)
  .to("temp-alerts", 
    Produced.with(Serdes.String(), readingSerde));

Three lines vs thirty. That's the power of declarative stream processing.

Core Concepts

1. KStream vs KTable

Kafka Streams has two main abstractions:

KStream: An unbounded stream of events (click events, logs, sensor data)
KTable: A changelog stream showing the latest value per key (user profiles, inventory counts)

java

1234567

// KStream - every event matters
KStream<String, Click> clicks = 
  builder.stream("user-clicks");

// KTable - only latest state per key
KTable<String, UserProfile> profiles = 
  builder.table("user-profiles");

2. Stateless Operations

These don't need to remember previous events:

java

1234567

stream
  .filter((k, v) -> v.isValid())
  .filterNot((k, v) -> v.isSpam())
  .map((k, v) -> KeyValue.pair(
      v.getUserId(), 
      v.toEvent()))
  .flatMap((k, v) -> v.getItems())

3. Stateful Operations

These maintain state across events:

java

1234567891011

// Aggregation - count events per key
KTable<String, Long> clickCounts = stream
  .groupByKey()
  .count();

// Windowed aggregation - count per hour
KTable<Windowed<String>, Long> hourlyClicks = stream
  .groupByKey()
  .windowedBy(TimeWindows.ofSizeWithNoGrace(
      Duration.ofHours(1)))
  .count();

4. Joins

Combine streams and tables:

java

123

KStream<String, EnrichedClick> enriched = clicks
  .join(profiles,
    (click, profile) -> new EnrichedClick(click, profile));

Setting Up Kafka Streams

Dependencies (Maven)

xml

12345

<dependency>
  <groupId>org.apache.kafka</groupId>
  <artifactId>kafka-streams</artifactId>
  <version>3.6.0</version>
</dependency>

Basic Application Structure

java

123456789101112131415161718192021

public class StreamProcessor {
  public static void main(String[] args) {
    Properties props = new Properties();
    props.put(StreamsConfig.APPLICATION_ID_CONFIG, 
      "my-stream-app");
    props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, 
      "localhost:9092");
    
    StreamsBuilder builder = new StreamsBuilder();
    builder.stream("input-topic")
      .filter((k, v) -> v != null)
      .to("output-topic");
    
    KafkaStreams streams = 
      new KafkaStreams(builder.build(), props);
    streams.start();
    
    Runtime.getRuntime().addShutdownHook(
      new Thread(streams::close));
  }
}

Real-World Example: Fraud Detection

java

1234567891011121314151617181920212223

StreamsBuilder builder = new StreamsBuilder();

KStream<String, Transaction> transactions = 
  builder.stream("transactions");

// High value alerts
transactions
  .filter((userId, tx) -> tx.getAmount() > 10000)
  .to("high-value-alerts");

// Velocity check - more than 5 tx per minute
transactions
  .groupByKey()
  .windowedBy(TimeWindows.ofSizeWithNoGrace(
      Duration.ofMinutes(1)))
  .count()
  .toStream()
  .filter((windowedKey, count) -> count > 5)
  .map((windowedKey, count) -> 
    KeyValue.pair(
      windowedKey.key(), 
      "Velocity alert: " + count + " tx/min"))
  .to("velocity-alerts");

For Node.js Developers

Kafka Streams is Java-only, but you have options:

kafkajs - Full Kafka client for Node.js
ksqlDB - SQL interface for stream processing
Kafka Connect - Handle transformations in connectors

typescript

123456789101112131415

import { Kafka } from 'kafkajs';

const kafka = new Kafka({ brokers: ['localhost:9092'] });
const consumer = kafka.consumer({ groupId: 'my-group' });

await consumer.subscribe({ topic: 'sensor-readings' });

await consumer.run({
  eachMessage: async ({ message }) => {
    const reading = JSON.parse(message.value.toString());
    if (reading.temp_celsius > 30) {
      console.log('Alert!', reading);
    }
  },
});

When to Use Kafka Streams

Good fit:

Real-time ETL and data enrichment
Event-driven microservices
Fraud/anomaly detection
Real-time analytics
CDC (Change Data Capture) processing

Consider alternatives:

Batch processing → use Spark
Need Python/Node.js → use Flink or custom consumers
Simple pub/sub → just use Kafka directly

Key Takeaways

Kafka Streams = declarative stream processing - Say what, not how

No separate cluster needed - Runs as a regular Java app

Exactly-once semantics - Built-in reliability

Scalable - Just run more instances

Fault-tolerant - State is backed by Kafka topics

Stream processing doesn't have to be complex. With Kafka Streams, you get the power of distributed systems with the simplicity of a library.

What is Kafka Streams?

The Problem It Solves

Core Concepts

1. KStream vs KTable

2. Stateless Operations

3. Stateful Operations

4. Joins

Setting Up Kafka Streams

Dependencies (Maven)

Basic Application Structure

Real-World Example: Fraud Detection

For Node.js Developers

When to Use Kafka Streams

Key Takeaways

Stay Updated 📬