Skip to content

Kafka 4.x #870

@razvan

Description

@razvan

Which new version of Apache Kafka should we support?

Kafka 4 has been released. This is the first release that operates entirely without ZooKeeper and running KRaft by default.

KRaft is officially available since release Kafka 3.9.

This means a new (Kafka-) role must be introduced to replace the external ZooKeeper.
A powerful new consumer group protocol designed to dramatically improve rebalance performance is introduced to significantly reduce downtime and latency. Java versions were updated to 11 and 17 respectively.

Release notes: https://archive.apache.org/dist/kafka/4.0.0/RELEASE_NOTES.html

Docker image

Current Status

Next

  • Kafka Demos: Update to Kafka 4.x demos#232
    • demo docs need to be rewritten to showcase the usage of Kafka but without kcat
  • Replace kcat with Kafka client scripts
  • GracefulShutdown improvements: Currently Prestop sleep hook is used in the Controller to provide brokers more time to off load when shutting down the cluster. This is a beta feature until Kubernetes 1.34 and must be replaced since we do not want to use beta features. We want to do this timeboxed (4h) if e.g. autodetection of the Kubernetes version / Endpoint to request features is possible and we switch from Prestop hook to a different implementation.
  • Improve AntiAffinities controller / broker to ensure they are on different nodes?
    • @razvan: Currently the anti affinity rules ensure that brokers are spread out as much as possible. Same for controllers. To also separate controllers from brokers, taints and tolerations are probably the better mechanism because it allows nodes to be provisioned accordingly. For example, broker nodes could require more resources than controllers.
  • Liveness / Readiness (controller): Currently TCPProbe, improve via (e.g. check if quorum joinend?)
    • @razvan: An alternative to the tcp probe would either have to use a lightweight process like kcat or an HTTP endpoint.
    • kcat doesn't support Kraft controllers
    • the kafka rest proxy cannot be used because of the license restrictions
  • Improve PDBs for broker (currently 1) or controller (currently 1)?
  • 3.7.2 no dynamic quorum (bad for scaling) https://developers.redhat.com/articles/2024/11/27/dynamic-kafka-controller-quorum; documented here, do we want to suppress / warn within the operator?
  • Discovery (currently just host:port combinations exposed for brokers, no other connection details (TLS))

Next 2

The following issues are only partially (or not at all) implemented and tested.

Metadata

Metadata

Type

No type

Projects

Status

Development: In Progress

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions