Skip to content

Use PEM certificates loaded from secrets for Kafka #11447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tinaselenge
Copy link
Contributor

@tinaselenge tinaselenge commented May 19, 2025

Type of change

  • Refactoring

Description

  • Use KubernetesSecretConfigProvider to access secrets directly to configure Kafka truststore and keystore used for nodes to authenticate each other and with clients.
  • Create an internal secret that holds all the trusted certificates for Authorization server so that it can be accessed directly via KubernetesSecretConfigProvider to configure Authorization server's truststore.
  • Create an internal secret that holds all the trusted certificates under a same key for configured OAuth server of listeners, so that it can be volume mounted as a single cert file. This is used for OAuth server's Jaas configuration of truststore. The reason we cannot directly access this secret to configure ssl.truststore.certificates is because Jaas configuration cannot parse multiline of certificates. (This might get improved in a future PR but for now going with a simpler approach to keep the refactoring minimal).
  • Remove volume mounts and environment variables for configuring truststore and keystore as they are no longer needed because secrets are directly accessed.
  • Add auth hash of the internal secrets to the pod annotations, so that when there is a change in these secrets, pods will be triggered to roll.
  • Remove the script that is used to prepare TLS certificates which created PKCS12 certs based on the volume mounted PEM format certificates.
  • Refactored KafkaAgent to directly access the cluster CA and node certificates and use them to configure the HTTP server, instead of using PKCS12 certificates generated by the script. Added util class to allow creating JKS keystores from secrets.

Resolves part of #11294

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@tinaselenge tinaselenge marked this pull request as ready for review May 28, 2025 14:12
@tinaselenge tinaselenge requested review from ppatierno and katheris and removed request for ppatierno May 28, 2025 14:12
@ppatierno ppatierno added this to the 0.47.0 milestone Jun 4, 2025
@tinaselenge
Copy link
Contributor Author

@ppatierno @katheris can you please review this PR when you get a chance? Thank you :)

reconciliation.namespace(),
oauthSecret,
kafka.generateSecret(Map.of(oauthSecret + ".crt", mergeAndEncodeCerts(certs)), oauthSecret))
.mapEmpty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core part of this method is exactly the same as the authzTrustedCertsSecret. Is there any way to factor out a common method here? wdyt?

Copy link
Contributor Author

@tinaselenge tinaselenge Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look similar but quite different with the way we join all the secrets. For cluster trusted certs, we just add them under individual keys in the secret, so it's simple. For OAuth, we are collecting the certs as a list of strings first and then decode and merge them into a single string before encoding and putting it under a single key in the secret.

However, these two methods are getting used in several places in KafkaConnect, Kafka and KafkaBridge. So I think if I should move them in AuthenticationUtils, instead of them repeating them in their assembly operator classes.

sslContextFactory.setTrustStorePath(sslTruststorePath);
sslContextFactory.setTrustStorePassword(sslTruststorePassword);
sslContextFactory.setKeyStore(KafkaAgentUtils.jksKeyStore(nodeCertSecret));
sslContextFactory.setKeyStorePassword("changeit");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this "changeit"? We had a random generated password when using the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had followed what KafkaAgentClient had, but you are right, we should do the same as the script. I added a method to generate random password with 32 chars like the script did.

@tinaselenge tinaselenge force-pushed the use-pem-kafka branch 2 times, most recently from 3d7de64 to 3363f8a Compare June 18, 2025 09:15
@tinaselenge
Copy link
Contributor Author

Thank @ppatierno so much for reviewing the PR. I have now addressed your comments.

Could you also please kick off the regression tests?

@im-konge
Copy link
Member

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@im-konge
Copy link
Member

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@katheris katheris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look pretty good to me, I just had a couple of questions and suggestions that I added.

writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_TRUSTSTORE_LOCATION + "=/tmp/kafka/cluster.truststore.p12");
writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_TRUSTSTORE_PASSWORD + "=" + PLACEHOLDER_CERT_STORE_PASSWORD_CONFIG_PROVIDER_ENV_VAR);
writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_KEYSTORE_TYPE + "=PEM");
writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_KEYSTORE_CERTIFICATE_CHAIN + "=" + String.format(PLACEHOLDER_SECRET_TEMPLATE_KUBE_CONFIG_PROVIDER, reconciliation.namespace(), node.podName(), node.podName() + ".crt"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the code would read better if we had a method, e.g.

Suggested change
writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_KEYSTORE_CERTIFICATE_CHAIN + "=" + String.format(PLACEHOLDER_SECRET_TEMPLATE_KUBE_CONFIG_PROVIDER, reconciliation.namespace(), node.podName(), node.podName() + ".crt"));
writer.println(CruiseControlConfigurationParameters.METRICS_REPORTER_SSL_KEYSTORE_CERTIFICATE_CHAIN + "=" + secretConfigProvider(reconciliation.namespace(), node.podName(), node.podName() + ".crt"));

I can't remember what we've done for the other similar changes though, so maybe this is something we could look at as a follow up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea but yeah maybe I can do it in a follow up PR.

/**
* Class with various utility methods for generating KeyStore and TrustStore for KafkaAgent
*/
public class KafkaAgentUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these methods are duplicates of ones we have elsewhere, but I assume you put them here so we didn't have to pull in any other Strimzi modules and create cyclic dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea. I didn't think it was good idea to pull in any Strimzi modules into the agent running in the Kafka process.

@ppatierno
Copy link
Member

@tinaselenge I restarted failed regression tests, not sure if they were related to the PR but there were quite a few. Let's see the next run.

@im-konge
Copy link
Member

@tinaselenge I restarted failed regression tests, not sure if they were related to the PR but there were quite a few. Let's see the next run.

They failed even for the previous runs, so I guess they are related to the PR.

@tinaselenge
Copy link
Contributor Author

yes, they are definitely related as they failed locally for me as well. I fixed OAuth related failures but still trying to fix some failures in ListenersST that tests listeners with custom certificates. I will update the PR once I have it passing locally.

@katheris
Copy link
Contributor

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj scholzj modified the milestones: 0.47.0, 0.48.0 Jul 10, 2025
@tinaselenge tinaselenge force-pushed the use-pem-kafka branch 2 times, most recently from bfede7c to 8ae072a Compare July 23, 2025 14:28
@tinaselenge
Copy link
Contributor Author

@strimzi/maintainers can someone please kick off the regression tests? I believe I have fixed them now. Thank you.

@im-konge
Copy link
Member

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tinaselenge
Copy link
Contributor Author

There is one remaining system test failure:

testKafkaWithVersion Kafka version: KafkaVersion{version='3.9.0', metadataVersion='3.9', isDefault=false, isSupported=true}.version()

testKafkaWithVersion Kafka version: KafkaVersion{version='3.9.1', metadataVersion='3.9', isDefault=false, isSupported=true}.version()

They fail because of the changes I made to KafkaAgent is missing from kafka-agent-3 (agent for Kafka 3.x versions). To fix this test, I would have to duplicate all the changes in kafka-agent-3. However Kafka 4.1.0 is currently in the process of getting released and will soon be supported in the next Strimzi release. This means the next Strimzi version will not support Kafka 3.9.x version, therefore kafka-agent-3 will be removed. Given that, we will likely to remove it in the next 2,3 weeks, I decided not to fix this test and wait for it to get removed before merging this PR.

While we wait for that to happen, I think this PR can be reviewed again since all the review comments were addressed and other system tests were fixed.

@ppatierno
Copy link
Member

@tinaselenge could you please solve the conflicts we have now :-(

@tinaselenge
Copy link
Contributor Author

@ppatierno done :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants