Getting Started with Apache Kafka and Spring

cover.jpg Apache Kafka is a powerhouse in the world of distributed systems, acting as a message broker that bridges services with efficiency and resilience. Whether you’re building microservices, streaming data pipelines, or log aggregation systems, Kafka—especially when paired with Spring—offers a robust solution. In this blog post, we’ll explore Kafka’s core concepts, its problem-solving prowess, real-world use cases, and how Spring makes it a breeze to integrate. Let’s dive in!

Why Kafka?

The Problem: Tight Coupling

In traditional architectures, services often depend directly on one another. If Service A needs data from Service B, it calls B and waits for a response. But what happens if B is slow or down? Service A slows down or fails too, creating a ripple effect that disrupts your entire business.

tight coupling Diagram showing the Order service directly calling Payment (marked with an X to indicate failure), Analytics, Inventory, and Notification services, illustrating tight coupling.

The Kafka Solution: Decoupling with a Queue

solution

Kafka introduces a message broker to decouple these services, using topics to facilitate asynchronous communication. Instead of direct calls, the Order service publishes events to Kafka topics, and downstream services consume these events at their own pace. Here’s how Kafka transforms the scenario:

  • Order Service: Publishes an event to a topic like order-placed (e.g., “Order #123 placed”).

  • Kafka Topics:

    • order-placed: Subscribed to by Payment, Analytics, Inventory, and Notification.
  • Downstream Services:

    • Payment consumes the event and processes the payment when it’s back online.
    • Analytics logs the order for reporting.
    • Inventory updates stock levels.
    • Notification sends a confirmation email.

The Benefits:

  • Asynchronous Processing: The Order service doesn’t wait—it publishes the event and moves on to the next order.
  • Fault Tolerance: If Payment is down, the event stays in the order-placed topic until Payment recovers. Other services (like Analytics) can still process the event immediately.
  • Scalability: Each service can scale independently. For example, Inventory can add more consumers to handle peak loads without affecting Order.

With Kafka, the Order service is no longer at the mercy of its dependencies. The system becomes resilient, scalable, and responsive, even under failure conditions.

Kafka Core Concepts

To wield Kafka effectively, you need to grasp its building blocks. Here’s a rundown:

  • Event/Message: The data unit Kafka handles—like a “payment request” or “log entry.”
  • Topic: A named queue for categorizing events (e.g., payment-requests). Think of it as a channel.
  • Producer: An application sending messages to a topic (e.g., the Order service).
  • Consumer: An application subscribing to a topic to process messages (e.g., the Payment service).
  • Consumer Group: A set of consumers sharing the load of a topic. Kafka distributes partitions among group members dynamically for scalability.
  • Partition: A topic can be split into partitions for parallelism. Each partition has its own producers and consumers, stored on a Kafka broker.
  • Kafka Broker: A server instance managing topics, partitions, and message delivery.
  • Replica: Copies of a partition across brokers for redundancy and fault tolerance.
  • Dead Letter Topic (DLT): A special topic for messages that fail processing after retries, preserving them for debugging or recovery.

Spring Kafka simplifies these concepts with annotations like @KafkaListener for consumers and KafkaTemplate for producers, as we’ll see later.

Kafka’s Evolution: ZooKeeper vs. KRaft

Kafka’s architecture has evolved, particularly in how it manages its brokers.

ZooKeeper (Legacy)

zookeeper Historically, Kafka relied on ZooKeeper, an external coordination service, to:

  • Track broker metadata (e.g., which brokers are alive).
  • Assign partition leaders.
  • Store configurations (e.g., replication factor, partition count).

ZooKeeper runs separately (port 2181), syncing with Kafka, which adds complexity and startup latency.

KRaft (Kafka Raft)

Introduced in Kafka 2.8 and production-ready by 3.3, KRaft replaces ZooKeeper with a built-in Raft consensus protocol:

  • Some Kafka brokers double as controllers, forming a quorum to manage metadata.
  • Metadata is stored in a Kafka topic (__cluster_metadata), leveraging Kafka’s log-based storage.

KRaft vs. ZooKeeper

AspectKRaftZooKeeper
IntegrationBuilt into KafkaExternal service
Startup TimeFaster (single system)Slower (ZooKeeper + Kafka sync)
ScalabilityMillions of partitions~10,000 partitions (ZooKeeper limit)
ComplexityLower (one system to manage)Higher (two systems)

Why KRaft? It simplifies operations, scales better, and aligns with Kafka’s log-centric design. Spring Kafka supports both, but KRaft is the future (ZooKeeper is deprecated in Kafka 3.7).

Kafka Use Cases

Kafka shines in diverse scenarios, especially with Spring’s integration:

  • Message Queuing: Decouple services (e.g., Order and Payment) for resilience.
  • Streaming Data: Feed real-time data to Spark Streaming or Flink via Kafka topics.
  • Log Aggregation: Collect logs from services and send them to tools like ELK (Elasticsearch, Logstash, Kibana).
  • Data Replication: Sync data across systems or regions.
  • Monitoring & Alerting: Stream metrics to monitoring tools and trigger alerts.

Spring enhances these with @KafkaListener for consumers and KafkaStreams for processing, as we’ll explore.

Kafka in Action with Spring: A Practical Example

kafka_spring.jpg Let’s see Kafka in a Spring Boot app, combining a producer, consumer, REST API, and test.

Setup: Kafka on Docker

Use the bitnami/kafka image for simplicity:

docker run -d --name kafka -p 9092:9092 \
    -e KAFKA_ENABLE_KRAFT=yes \
    -e KAFKA_CFG_PROCESS_ROLES=broker,controller \
    -e KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093 \
    -e KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
    -e KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@localhost:9093 \
    -e KAFKA_KRAFT_CLUSTER_ID=123 \
    -e KAFKA_CFG_NODE_ID=1 \
    bitnami/kafka:latest

This runs a single-node Kafka with KRaft.

Producer Class

@Service
public class KafkaProducer {
    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    public void sendMessage(String message) {
        kafkaTemplate.send("messages", message);
    }
}

Consumer Class with DLT

java

@Component
public class KafkaConsumer {
    private static final Logger LOG = LoggerFactory.getLogger(KafkaConsumer.class);

    @RetryableTopic(attempts = "3", backoff = @Backoff(delay = 2000)) //Retries failed messages 3 times, then sends to messages-dlt.
    @KafkaListener(topics = "messages", groupId = "my-group")
    public void listen(String message) {
        LOG.info("Received: {}", message);
        if (message.contains("fail")) {
            throw new RuntimeException("Processing failed");
        }
    }

    @DltHandler //Logs failures
    public void handleDlt(String message) {
        LOG.info("DLT Received: {}", message);
    }
}

REST API Controller

@RestController
@RequestMapping("/api")
public class MessageController {
    @Autowired
    private KafkaProducer producer;

    @PostMapping("/send")
    public ResponseEntity<String> sendMessage(@RequestBody String message) {
        producer.sendMessage(message);
        return ResponseEntity.ok("Message sent: " + message);
    }
}

Hit /api/send with curl:

curl -X POST -H "Content-Type: text/plain" "http://localhost:8080/api/send" -d "hello"

Test Class

@SpringBootTest
@EmbeddedKafka(partitions = 1, topics = "messages")
class KafkaTests {
    @Autowired
    private KafkaProducer producer;

    @Test
    void testMessageFlow() {
        producer.sendMessage("test");
        // Add assertions (e.g., mock consumer or log check)
    }
}

Kafka Streams Example

Process data with Spring Kafka Streams:

@Configuration
@Component
public class MessageTopology {
    @Autowired
    public void buildTopology(StreamsBuilder builder) {
        builder.stream("messages", Consumed.with(Serdes.String(), Serdes.String()))
        .groupByKey()
        .count()
        .toStream()
        .to("message-counts");
    }
}

Counts messages in messages topic by key and outputs to message-counts topic.

Kafka vs. RabbitMQ

kafka_rabbit Kafka isn’t the only message broker. How does it compare to RabbitMQ?

FeatureKafkaRabbitMQ
StoragePersists messages (like a library)Deletes after delivery (like a mailbox)
MetaphorNetflix: replayable streamsTV: live broadcasts
QueuesSimple partitionsPriority queues, complex routing
ModelPull (consumers fetch)Push (broker delivers)
Use CaseStreaming, log aggregationTask queues, RPC

Spring supports both, but Kafka excels at high-throughput, durable messaging.

Kafka’s Protocol

Kafka uses a custom TCP-based binary protocol:

  • Efficient: Binary format reduces overhead.
  • Port: Default 9092 (or 9093 for SSL).
  • Pull Model: Consumers request data, unlike RabbitMQ’s push.

Spring Kafka abstracts this with KafkaTemplate and @KafkaListener, so you rarely touch the protocol directly.

Potential Misuses

miss_use Nothing in tech world comes as a panacea:

  • Overcomplicating Simple Tasks: Kafka’s overhead isn’t worth it for basic request-response scenarios—use REST or RabbitMQ instead.
  • Ignoring Retention: Storing messages forever without cleanup can balloon storage.

Resources to Learn More

Kafka Crash Course (YouTube)

Spring Kafka GitHub

Spring Kafka Docs

Baeldung Spring Kafka Guide

Conclusion

Kafka, with Spring, transforms how services communicate—decoupling them for resilience, scaling them with partitions, and tracing them with tools like Micrometer. Whether you’re queuing messages, streaming data, or aggregating logs, Kafka’s flexibility shines. Try the examples above, experiment with KRaft, and see how Spring Kafka simplifies your next project. Happy coding!