Kafka Streams: Real-Time Data Processing Made Simple
Learn how Kafka Streams transforms complex event streaming into declarative, maintainable code. From basic concepts to practical examples.
Apache Kafka has become the backbone of modern data infrastructure. But while Kafka handles the heavy lifting of storing and transmitting events, Kafka Streams is what makes processing those events actually enjoyable.
What is Kafka Streams?
Kafka Streams is a client library for building real-time streaming applications. Unlike Spark Streaming or Flink, it doesn't require a separate cluster—your application runs as a regular Java process.
> Key insight: Kafka Streams is an abstraction over producers and consumers that lets you focus on what you want to do, not how to do it.
The Problem It Solves
Imagine you're processing sensor data from a production line. Each sensor sends readings like this:
{ "reading_ts": "2026-02-05T12:19:27Z", "sensor_id": "aa-101", "production_line": "w01", "widget_type": "acme94", "temp_celsius": 23, "widget_weight_g": 100}You need to filter readings where temperature exceeds a threshold. With vanilla Kafka, you'd write something like this:
public static void main(String[] args) { try(Consumer<String, Reading> consumer = new KafkaConsumer<>(consumerProps()); Producer<String, Reading> producer = new KafkaProducer<>(producerProps())) { consumer.subscribe(List.of("sensor-readings")); while (true) { ConsumerRecords<String, Reading> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, Reading> record : records) { Reading reading = record.value(); if (reading.getTempCelsius() > 30) { ProducerRecord<String, Reading> alert = new ProducerRecord<>("temp-alerts", record.key(), reading); producer.send(alert); } } } }}That's a lot of boilerplate. Now here's the same logic with Kafka Streams:
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("sensor-readings", Consumed.with(Serdes.String(), readingSerde)) .filter((key, reading) -> reading.getTempCelsius() > 30) .to("temp-alerts", Produced.with(Serdes.String(), readingSerde));Three lines vs thirty. That's the power of declarative stream processing.
Core Concepts
1. KStream vs KTable
Kafka Streams has two main abstractions:
- KStream: An unbounded stream of events (click events, logs, sensor data)
- KTable: A changelog stream showing the latest value per key (user profiles, inventory counts)
// KStream - every event mattersKStream<String, Click> clicks = builder.stream("user-clicks");
// KTable - only latest state per keyKTable<String, UserProfile> profiles = builder.table("user-profiles");2. Stateless Operations
These don't need to remember previous events:
stream .filter((k, v) -> v.isValid()) .filterNot((k, v) -> v.isSpam()) .map((k, v) -> KeyValue.pair( v.getUserId(), v.toEvent())) .flatMap((k, v) -> v.getItems())3. Stateful Operations
These maintain state across events:
// Aggregation - count events per keyKTable<String, Long> clickCounts = stream .groupByKey() .count();
// Windowed aggregation - count per hourKTable<Windowed<String>, Long> hourlyClicks = stream .groupByKey() .windowedBy(TimeWindows.ofSizeWithNoGrace( Duration.ofHours(1))) .count();4. Joins
Combine streams and tables:
KStream<String, EnrichedClick> enriched = clicks .join(profiles, (click, profile) -> new EnrichedClick(click, profile));Setting Up Kafka Streams
Dependencies (Maven)
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>3.6.0</version></dependency>Basic Application Structure
public class StreamProcessor { public static void main(String[] args) { Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-stream-app"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); StreamsBuilder builder = new StreamsBuilder(); builder.stream("input-topic") .filter((k, v) -> v != null) .to("output-topic"); KafkaStreams streams = new KafkaStreams(builder.build(), props); streams.start(); Runtime.getRuntime().addShutdownHook( new Thread(streams::close)); }}Real-World Example: Fraud Detection
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Transaction> transactions = builder.stream("transactions");
// High value alertstransactions .filter((userId, tx) -> tx.getAmount() > 10000) .to("high-value-alerts");
// Velocity check - more than 5 tx per minutetransactions .groupByKey() .windowedBy(TimeWindows.ofSizeWithNoGrace( Duration.ofMinutes(1))) .count() .toStream() .filter((windowedKey, count) -> count > 5) .map((windowedKey, count) -> KeyValue.pair( windowedKey.key(), "Velocity alert: " + count + " tx/min")) .to("velocity-alerts");For Node.js Developers
Kafka Streams is Java-only, but you have options:
- kafkajs - Full Kafka client for Node.js
- ksqlDB - SQL interface for stream processing
- Kafka Connect - Handle transformations in connectors
import { Kafka } from 'kafkajs';
const kafka = new Kafka({ brokers: ['localhost:9092'] });const consumer = kafka.consumer({ groupId: 'my-group' });
await consumer.subscribe({ topic: 'sensor-readings' });
await consumer.run({ eachMessage: async ({ message }) => { const reading = JSON.parse(message.value.toString()); if (reading.temp_celsius > 30) { console.log('Alert!', reading); } },});When to Use Kafka Streams
Good fit:
- Real-time ETL and data enrichment
- Event-driven microservices
- Fraud/anomaly detection
- Real-time analytics
- CDC (Change Data Capture) processing
Consider alternatives:
- Batch processing → use Spark
- Need Python/Node.js → use Flink or custom consumers
- Simple pub/sub → just use Kafka directly
- Kafka Streams = declarative stream processing - Say what, not how
Key Takeaways
Stream processing doesn't have to be complex. With Kafka Streams, you get the power of distributed systems with the simplicity of a library.
Stay Updated 📬
Get the latest tips and tutorials delivered to your inbox. No spam, unsubscribe anytime.