The power of Kubernetes: Building a multi-broker Kafka cluster in minutes

12-02-2022
Coforge-Salesforce BU
Data and Analytics Strimzi Kafka Kafka cluster Kubernetes Apache Kafka

This article demonstrates how we can exploit the power of Kubernetes to build a multi-broker Kafka cluster in minutes using Strimzi. To start, we look at a brief introduction to Apache Kafka and Strimzi before walking through on creation of the Kafka cluster.

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. It can linearly scale with multiple producers and consumers in real-time. Before making a decision, many development teams want to run a pilot to confirm if Kafka is the right fit to solve their problem and if so, want to adapt it for production workload at scale quickly. However, setting up the platform and optimally managing the operations require specialised skills and time for testing and tuning. To address these needs, using Kafka Operators in Kubernetes is an excellent approach.

Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and managing containerised applications. Kubernetes can manage both stateful and stateless applications at scale. For handling the stateful applications, you use Kubernetes Operators.

Kubernetes Operators are software extensions that use custom resources to manage applications and their components. The operator extends the Kubernetes API via technology-specific objects and orchestrates the management life cycle. Engineers don’t require in-depth knowledge to get started on Kubernetes, as an operator is responsible for implementing and managing the underlying application out of the box. Out of a few available Kubernetes Operators for Kafka, Strimzi is one of the best.

Strimzi

Strimzi is an open-source project that simplifies the process of running Apache Kafka in a Kubernetes cluster by providing container images and Kubernetes operators. Strimzi Operators are fundamental to the running of Strimzi. The Operators provided with Strimzi are purpose-built with specialist operational knowledge to manage Kafka effectively. Operators greatly simplify administration tasks and significantly reduce manual intervention. Strimzi also offers features to enable authentication, authorisation, and the use of custom certificate authority. As part of its abilities, Strimzi can run on plain Mikube or managed Kubernetes environments like AWS EKS or RedHat OpenShift.

To help manage the applications and their components, you can use the various operators that Stimzi bundles for managing a kafka cluster running within a Kubernetes cluster.

Cluster Operator

Deploys and manages Apache Kafka and Zookeeper clusters, Kafka Connect, Kafka Mirror-Maker, Kafka Bridge, Kafka Exporter, and the Entity Operator.

Entity Operator

This comprises of the Topic Operator and the User Operator.

Topic Operator

Manages Kafka topics.

User Operator

Manages Kafka users.

The Cluster Operator can deploy the Topic Operator and User Operator as part of an Entity Operator configuration simultaneously as a Kafka cluster.

Operators within the Strimzi architecture

In Kubernetes, Custom Resources are extensions of the Kubernetes API and are added to the existing Kubernetes resources as API instances. Custom Resources (CRDs) can be represented as YAML configuration files and can be referred to for the installation or updating of the resource. Strimzi provides such highly configurable CRDs for Kafka components.

Installing Strimzi

First, install Strimzi with the operators via “kubectl create” in a favourite Kubernetes namespace. The CRDs define the schemas used for declarative management of the Kafka cluster, Kafka topics and users. This enables us to use the custom resources to abstract and manage the clusters.

Notice the CRDs created in the Kafka namespace by “kubectl -n kafka get crds”

Once the Strimzi CRDs are created, use these custom resources and provision the Kafka cluster using “kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka”

Notice that it has deployed the Strimzi’s operators and Zookeeper and Kafka broker pods. We can list them using “kubectl -n kafka get pods”

You can see that our Zookeeper and Kafka pods are running, and hence the Kafka cluster is running.

Modifying number of brokers/zookeeper instances:

The above example shows that it runs only one pod for Kafka and Zookeeper each. It is straightforward to change the number of instances by updating the CRD of the cluster.

Update the replicas from 1 to the desired number in the Cluster CRD using “kubectl edit kafka /my-cluster -n kafka”. The following example shows we are changing the number to 3.

This spawns two more Zookeepers and two more Kafka pods within a few seconds. Getting confirmed after just a few seconds of editing the Kafka CRD.

Note that the default example CRD can be found in the Strimzi’s GIT repository.

Creating and listing topics

Once the cluster is up and running, we can create the topics using CRDs. Example CRD of a topic is found in the Strimzi’s GIT repository. The “KafkaTopic” CRD is used to create and configure topics like replication factor, number of partitions, retention time, etc. When we make, modify or delete a topic using the “KafkaTopic” CRD, the “Topic Operator” performs those changes on the Kafka cluster.