-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Allow network to keep working after renewing certs with the same key #5268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Allow network to keep working after renewing certs with the same key #5268
Conversation
…pair in BFT Signed-off-by: David VIEJO <[email protected]>
- Updated the `VerifyCertificate` function in `PredicateDialer` to use the provided `verifyFunc` instead of a hardcoded function. - Removed the logging and verification function in `StandardDialer`, setting it to `nil` for simplicity. This change enhances flexibility in certificate verification while cleaning up unnecessary code. Signed-off-by: David VIEJO <[email protected]>
Integration tests are missing, and probably unit testing. But I want to make sure this is the right approach. I don't like having to modify logic in the common/cluster logic, such as: But I don't see any other solution, since if I change membership to be the public key instead of a certificate, we also need to modify the logic of common/cluster. |
Signed-off-by: David VIEJO <[email protected]>
Signed-off-by: David VIEJO <[email protected]>
Signed-off-by: David VIEJO <[email protected]>
Signed-off-by: David VIEJO <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
Comments on production code are mostly nits, but the biggest problem is the comment about the integration test - it changes TLS certificates instead of enrollment certificates, so it doesn't prove the feature works, unfortunately.
// First decode PEM if it's PEM encoded | ||
block, _ := pem.Decode(certBytes) | ||
var der []byte | ||
if block != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one of these if
branches should be the expected branch the execution follows, and the other if
branch should be treated as an error, wouldn't you agree?
Besides, it's incorrect to assume that if the block
is nil then it's because the function has been passed a DER and not a PEM.
Let's find out which one of the two branches is the right one and treat the other as an error.
That being said, if you follow the code of IsChannelMember
I think you'll agree with me that we always pass a DER into this function.
|
||
myPublicKey, err := extractPublicKeyFromCert(bl.Bytes) | ||
if err != nil { | ||
c.Logger.Debugf("Failed to extract public key from own certificate: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this log entry should be at least a warning, as this is an abnormal situation that points at a channel misconfiguration.
for _, consenter := range oc.Consenters() { | ||
santizedCert, err := crypto.SanitizeX509Cert(consenter.Identity) | ||
if err != nil { | ||
c.Logger.Debugf("Failed to sanitize consenter %d identity: %v", consenter.Id, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a warning as well
// Extract public key using the same approach as IsConsenterOfChannel | ||
bl, _ := pem.Decode(santizedCert) | ||
if bl == nil { | ||
c.Logger.Debugf("Consenter %d: failed to decode PEM for identity", consenter.Id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a warning as well.
|
||
publicKey, err := extractPublicKeyFromCert(bl.Bytes) | ||
if err != nil { | ||
c.Logger.Debugf("Consenter %d: failed to extract public key from cert: %v", consenter.Id, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a warning as well
@@ -175,6 +201,31 @@ func (s *ClusterService) VerifyAuthRequest(stream orderer.ClusterNodeService_Ste | |||
return authReq, nil | |||
} | |||
|
|||
func compareCertPublicKeys(cert1, cert2 []byte) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function should probably be moved to util.go
.
network.GenerateConfigTree() | ||
network.Bootstrap() | ||
|
||
network.EventuallyTimeout *= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we trying to accomplish here? 1
is the identity element for integer multiplication.
var ordererRunners []*ginkgomon.Runner | ||
for _, orderer := range network.Orderers { | ||
runner := network.OrdererRunner(orderer) | ||
runner.Command.Env = append(runner.Command.Env, "FABRIC_LOGGING_SPEC=orderer.common.cluster=debug:orderer.consensus.smartbft=debug:policies.ImplicitOrderer=debug") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for debugging, do we really need it?
runner.Command.Env = append(runner.Command.Env, "FABRIC_LOGGING_SPEC=orderer.common.cluster=debug:orderer.consensus.smartbft=debug:policies.ImplicitOrderer=debug")
|
||
Eventually(peerProcesses.Ready(), network.EventuallyTimeout).Should(BeClosed()) | ||
peer := network.Peer("Org1", "peer0") | ||
_ = peer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this _ = peer
is redundant.
} | ||
ordererDomain := network.Organization(orderers[0].Organization).Domain | ||
ordererTLSCAKeyPath := filepath.Join(network.RootDir, "crypto", "ordererOrganizations", | ||
ordererDomain, "tlsca", "priv_sk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The certificate tree we are supposed to modify is not the TLS certificate tree but the enrollment certificate tree.
So, this test changes the TLS certificates, but BFT works with enrollment certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the test to renew both, TLS and enrollment certificates.
- Renamed `renewOrdererCertificates` to `renewOrdererTLSCertificates` for clarity. - Introduced `renewOrdererEnrollmentCertificates` to handle enrollment certificate renewal. - Moved public key extraction logic to `util.go` and updated references in `consenter.go` and `clusterservice.go` to use the new method. - Improved logging for certificate processing errors. This refactor enhances code clarity and maintains consistency in certificate management across the codebase. Signed-off-by: David VIEJO <[email protected]>
- Refactored the `compareCertPublicKeys` function to `CompareCertPublicKeys` for improved clarity and consistency across the codebase. - Updated references in `clusterservice.go` and `util.go` to utilize the new function. - Added comprehensive unit tests for `CompareCertPublicKeys`, covering scenarios with certificates having the same public keys but different bytes, as well as handling errors gracefully for malformed certificates. These changes improve the robustness of certificate handling and verification in the Cluster service. Signed-off-by: David VIEJO <[email protected]>
- Modified the logging specification for orderer runners in `smartbft_test.go` to focus on SmartBFT and gRPC debug logs, enhancing the clarity of test outputs. Signed-off-by: David VIEJO <[email protected]>
// The certificate was made 1 minute ago (1 hour doesn't work since cert will be before original CA cert NotBefore time) | ||
cert.NotBefore = time.Now().Add((-1) * time.Minute) | ||
// As well as the CA certificate | ||
caCert.NotBefore = time.Now().Add((-1) * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to change the CA cert? you only need to expire the orderer cert.
Signed-off-by: David VIEJO <[email protected]>
… tests - Implemented chaincode deployment and multiple invocation queries in the end-to-end SmartBFT configuration test. - Removed unused certificate renewal functions to streamline the code. These enhancements improve the testing coverage for SmartBFT and ensure proper chaincode functionality. Signed-off-by: David VIEJO <[email protected]>
Signed-off-by: dviejokfs <[email protected]>
…ss to streamline code and improve clarity. Signed-off-by: David VIEJO <[email protected]>
@pfi79 could you do a revision after the recent changes? |
Please wait a couple of days, @dviejokfs , I will review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parking this PR, I will revisit it in a couple of days.
@@ -785,3 +933,60 @@ func TestClusterRequestAsString(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
// generateCertWithSameKey creates a new certificate using the provided signer (private key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function isn't used, you can remove it.
// renewOrdererEnrollmentCertificates renews the signcert for each orderer with a given expirationTime | ||
// and re-writes it to the orderer's signcerts directory, matching the actual crypto structure. | ||
func renewOrdererEnrollmentCertificates(network *nwo.Network, notAfter time.Time, orderers ...*nwo.Orderer) { | ||
if len(orderers) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this if
, a for loop over an empty slice is a no-op
certBytes, err := x509.CreateCertificate(rand.Reader, newCert, caCert, cert.PublicKey, caKey) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
renewedCertPEM = pem.EncodeToMemory(&pem.Block{Bytes: certBytes, Type: "CERTIFICATE"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just return em.EncodeToMemory(&pem.Block{Bytes: certBytes, Type: "CERTIFICATE"})
|
||
// renewCertificate generates a new certificate with the same public key and subject as the original, | ||
// but with a new NotAfter (expiration time). Only the expiration is changed. | ||
func renewCertificate(certPEM, caCertPEM, caKeyPEM []byte, notAfter time.Time) (renewedCertPEM []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the renewedCertPEM
just keep the []byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits but not critical, you can make a follow-up PR to take care of them.
LGTM, nice work and thanks!
I'll hold off from merging for a week to let @pfi79 or any other maintainer to take a second look.
By("Renewing the enrollment certificates of the orderers") | ||
renewOrdererEnrollmentCertificates(network, time.Now().Add(time.Hour), network.Orderers...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I am confused by this. The enrollment certificates are in the config block consenter map. They can only be changed by a config TX, correct? Are you suggesting that now orderers ignore the certificate and only use a public key from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the aim of the PR is to change the binding from TLS Cert hash to TLS public key, and yet the code shows that it is the logic of enrollment certs in the cluster service that changes. Can you explain how this feature is supposed to work in practice? Is the issue about renewal of TLS certs, enrollment certs, or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that now orderers ignore the certificate and only use a public key from it?
Yeah that's the point of the PR.
I thought the aim of the PR is to change the binding from TLS Cert hash to TLS public key, and yet the code shows that it is the logic of enrollment certs in the cluster service that changes. Can you explain how this feature is supposed to work in practice? Is the issue about renewal of TLS certs, enrollment certs, or both?
Yeah that's a good point, the PR description indeed is not what the code is about.
@dviejokfs you should change the PR description...
updateBatchSize(network, peer, orderer, channel, func(batchSize *ordererProtos.BatchSize) { | ||
batchSize.AbsoluteMaxBytes = 1000000 | ||
batchSize.MaxMessageCount = 300 | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is updateBatchSize
relevant to this case? it is a config change, but why is it relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he's just trying to make a test transaction
equal, err := CompareCertPublicKeys(toIdentity, s.NodeIdentity) | ||
if err != nil { | ||
return nil, errors.Wrap(err, "failed to compare cert public keys") | ||
} | ||
if !equal { | ||
s.Logger.Debugf("node id mismatch for node %d, toIdentity: %s, s.NodeIdentity: %s", authReq.FromId, string(toIdentity), string(s.NodeIdentity)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the node sending the request knows the node accepting the request by an enrollment cert that is stale; i.e. only the public key is the same. Why is this legitimate? I thought the problem was with TLS certs, not enrollment certs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard the TLS certificate thing, I don't know why he wrote that.
This means that the node sending the request knows the node accepting the request by an enrollment cert that is stale; i.e. only the public key is the same. Why is this legitimate?
When you authenticate to a remote node, you send also its identity identifier.
If the identity identifier is wrong then the node receiving the connection attempt will reject the connection attempt.
Here, we just modify the code to only look at the public key.
I'm not sure why you think this is illegitimate. This has no cryptographic significance, it's just a way to detect channel misconfiguration of a node and also make the connection message non transferable.
} | ||
|
||
func CompareCertPublicKeys(cert1, cert2 []byte) (bool, error) { | ||
// Extract public key using the same approach as IsConsenterOfChannel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that IsConsenterOfChannel with BFT compares enrollment certs, it is the raft version of IsConsenterOfChannel that compares public keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what he's doing here, he's changing the implementation to fit the Raft style where we only care about the public key, not the entire cert. So that if you need to rotate your enrollment certificate, you won't need to do a channel config.
// This test demonstrates the scenario described in the PR comment: | ||
// - bytes.Equal(firstCert, secondCert) returns false (different certificate bytes) | ||
// - But the public keys are the same, so compareCertPublicKeys would return true | ||
// - This ensures that certificate renewal/reissuance with same key pair works correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renewal means extending the notAfter right? this ignores every aspect of the cert including key usage, name, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I think ideally he would just copy the certificate itself and just change the notAfter/notBefore, but I thought the gist of the test is clear so I let it go.
if bytes.Equal(myPublicKey, publicKey) { | ||
c.Logger.Debugf("Found matching public key for consenter %d", consenter.Id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this change related to what is stated in the problem statement?
This means we ignore enrollment certs as identity and only care about the public key. In the description it says the problem is the renewal of TLS certs, which, in BFT, are not part of node identity.
The code here is a significant change to how fabric orderer BFT handles consenters identity, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem statement is:
IE: "when TLS certs are renewed". There is nothing there about enrollment certs.
if bytes.Equal(c.Comm.NodeIdentity, santizedCert) { | ||
return cst.Id, nil | ||
|
||
if crypto.CertificatesWithSamePublicKey(thisNodeCertAsDER, certAsDER) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this change related to what is stated in the problem statement?
This means we ignore enrollment certs as identity and only care about the public key. In the description it says the problem is the renewal of TLS certs, which, in BFT, are not part of node identity.
The code here is a significant change to how fabric orderer BFT handles consenters identity, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is a significant change to how fabric orderer BFT handles consenters identity, isn't it?
I guess significant is relative, are you trying to say such a change requires an RFC? Because I don't think it does, there is precedent for doing this without an RFC - like we did here in Raft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we need a deeper discussion on this. I started reading the description thinking the problem is the binding of TLS Cert hash.... I expected to see changes in the delivery.go and areas around it, but found a completely different feature altogether. I think such a change deserves (at least) an issue and some attention by crypto minded people.
@ale-linux @adecaro @C0rWin
This is a BFT system, not Raft, so allowing orderers to operate without valid enrollment certs defined properly in the config block seems to me like a pretty big change. I am not comfortable with such a change entering without at least an issue discussing it properly. Yes, rotating certs is somewhat disruptive and annoying, but there is a reason behind it.
Note the current implementation completely ignores everything in the cert but the key.... and it is allowed on all nodes at once... In addition this does not extend to the follower... or the BFT synchronizer if I am not mistaken...
In addition, how does this solve the problem of TLS cert binding? this problem seems to be unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we need a deeper discussion on this. I started reading the description thinking the problem is the binding of TLS Cert hash....
This issue is not about TLS. In BFT we authenticate using enrollment certificates. The original code uses the enrollment certificate as-is and compares it in its raw byte form. The change proposed here is to just take into account the public key and not the entire certificate, in order that if you happen to rotate the enrollment certificate and keep the public key, then you wouldn't need a config update. This is how Raft works except with regarding to TLS certificates instead of enrollment certificates.
I expected to see changes in the delivery.go and areas around it, but found a completely different feature altogether.
This is not for block pulling but for the intra-orderer cluster communication for BFT. It has nothing to do with the deliver API.
This is a BFT system, not Raft, so allowing orderers to operate without valid enrollment certs defined properly in the config block seems to me like a pretty big change.
The original code doesn't require the certificate to be valid at all, so there is no change in that aspect.
Having a valid certificate chain is required when verifying a leaf certificate you are given and that you have not seen before.
But here, the certificates (or rather, public keys) are already known to you, since they're in the channel config that you hold. If I meet with you face to face and give you a certificate and say "hey Yoav this is my certificate" and later someone sends you a message and a signature that is verified by the former certificate, how do you know it's from me? You know it's from me because we have met before and I gave you my certificate. Here we have the exact same situation. You have a certificate in the channel config and someone sends you a message and a signature verifiable by that certificate from the channel config. Why do you need the certificate to be "valid"? The validity is implicit by it being in the channel config...
I am not comfortable with such a change entering without at least an issue discussing it properly. Yes, rotating certs is somewhat disruptive and annoying, but there is a reason behind it.
Yes I understand the reason behind it, that's why I am also fine with this change. It's disruptive and annoying and therefore for BFT orderers to be usable, @dviejokfs decided to make this change to improve the user experience. He actually (unlike both of us) deploys Fabric in production so this requirement of having to do a channel config for every orderer node was inconvenient enough to make a code contribution.
Note the current implementation completely ignores everything in the cert but the key.... and it is allowed on all nodes at once... In addition this does not extend to the follower... or the BFT synchronizer if I am not mistaken...
You're not mistaken but this is completely irrelevant. Why would it extend to followers or to sync?
In addition, how does this solve the problem of TLS cert binding? this problem seems to be unchanged.
I am not sure what you mean. You'll have to be more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code doesn't require the certificate to be valid at all, so there is no change in that aspect.
Having a valid certificate chain is required when verifying a leaf certificate you are given and that you have not seen before.
This is the problem I see here. The config update process should validate new certificates we are given as identities (or at least vigilant admins should not allow such invalid certificates and avoid endorsing them). However here there is no facility to verify that the certificate given in the context is even valid. So this essentially by-passes the endorsement process of orderer admins.
That is, certificates in the config block may be expired, but everything continues to work just because an orderer renewed it in its local file system and rebooted?
I think that this goes against security best practices in which an expired cert is a red flag. I think that if an organization sees a certificate of another organization becomes expired in the channel config it is a red flag.
It is like connecting to a web service and getting a warning "the certificate is expired, do you want to continue or go back to safety?" In the code proposed here there is no option to go back to safety...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code doesn't require the certificate to be valid at all, so there is no change in that aspect.
Having a valid certificate chain is required when verifying a leaf certificate you are given and that you have not seen before.This is the problem I see here. The config update process should validate new certificates we are given as identities (or at least vigilant admins should not allow such invalid certificates and avoid endorsing them). However here there is no facility to verify that the certificate given in the context is even valid. So this essentially by-passes the endorsement process of orderer admins.
Whether the config update process should or should not do sanity checks to the input is completely irrelevant to this PR. This PR does not change anything in this aspect. All that is done here is to ignore the other fields of the certificate and looking only at the public key, when mapping the authentication message sent over the wire to the consenter.
This doesn't bypass the endorsement process of orderer admins because the only way to get anything into the channel config anyway is via a config update, which this PR does not change.
That is, certificates in the config block may be expired, but everything continues to work just because an orderer renewed it in its local file system and rebooted?
The enrollment certificate of the BFT orderer in the channel config is used for two things:
- Establishing an authenticated communication channel
- Signing blocks and protocol messages
Since (1) happens over a TLS connection, the TLS certificate and TLS certification chain also play a part in establishing an authentication communication channel. If the TLS certificate expires, then you won't be able to connect.
In the code previous to this PR, if the enrollment certificate of the orderer expires, then as long as it's in the channel config, you can still authenticate a remote BFT orderer as long as its TLS certificate is still valid.
Therefore, if this is a problem, then it's a problem in the existing Fabric code, and this PR doesn't introduce a new problem. Without this PR, if we add expiry checks to enrollment certificates (which I reiterate, we currently do not have, in the current code) then we would have to do a config update for each renewal of the validity period of a certificate which is a painful thing. With this PR, it would be less painful to the users if we introduce validity checks to the enrollment certificates which we currently do not have in the current code.
Therefore, without this PR, if you have changed the enrollment certificate but kept the previous public key, you do not gain any security but you do gain the pain to do the channel config update. With this PR you at least don't get the pain.
I think that this goes against security best practices in which an expired cert is a red flag. I think that if an organization sees a certificate of another organization becomes expired in the channel config it is a red flag.
It is like connecting to a web service and getting a warning "the certificate is expired, do you want to continue or go back to safety?" In the code proposed here there is no option to go back to safety...
Notice that the PR only deals with certificates that they have been renewed but the public key remains the same (and therefore so does the private key).
The entire concept of validity checks for certificates is there to prevent leaked private keys from being used after the private keys have been regenerated.
If the private key has been leaked and the certificate of a web server changed but its public key is the same, an attacker that has the leaked private key can still impersonate the web server because it's not a problem getting the renewed certificate (it's public and is presented to you whenever you connect to the real web server).
I'm not against adding support for enrollment certificate expiration checks in the authenticated communication establishment path, but if you have a leaked enrollment key then you essentially can sign blocks on behalf of that orderer, right? So... the only way to protect against a key leaking is changing the public keys in the channel config anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's missing here is the fact that the set of enrollment certificates is visible also to the peers, and they use it to validate blocks. It is also visible to the clients. The users of the system are not just admins who may find it "painful" to do a config change.
Looking at a system which, in its source of truth - the config block in the blockchain - are expired certificates, a user would lose faith and trust in such a system. I see no way of getting around the necessity to keep the enrollment certs fresh and prevent them from expiring. How fabric does that right now is beside the issue. But going in the opposite direction, I think is against the interest of the systema as a whole.
A better approach to handle the administrator pain is to automate the cert renewal process. Another approach is to streamline the config change and make it easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's missing here is the fact that the set of enrollment certificates is visible also to the peers, and they use it to validate blocks. It is also visible to the clients. The users of the system are not just admins who may find it "painful" to do a config change.
Well, when peers validate blocks they don't take into account that the enrollment certificates of ordereres and peers may have expired. There is actually a bypass in the MSP that does that.
Looking at a system which, in its source of truth - the config block in the blockchain - are expired certificates, a user would lose faith and trust in such a system. I see no way of getting around the necessity to keep the enrollment certs fresh and prevent them from expiring. How fabric does that right now is beside the issue. But going in the opposite direction, I think is against the interest of the systema as a whole.
A better approach to handle the administrator pain is too automate the cert renewal process. Another approach is to streamline the config change and make it easier.
The Raft orderer lets you keep TLS certificates of consenters in the config block expired, as long as the certificates they hold are still valid and have public keys matching to the config block.
I made that change years ago, based on requests of users who actually deploy Fabric.
Now there is a request of a user who deploys Fabric in the real world (who was skilled enough to make a PR and contribute code).
It is not up to you (or me) to tell users when they should or should not lose faith and trust in the system they themselves deploy.
I have explained already that there is zero security benefit in keeping the same public key and renewing the certificate's validity period. Therefore, you should (logically) either be in favor of banning certificate renewal with the same public key entirely (which I think should not be up to us but to the users who deploy Fabric) or not block this PR from being merged, because you understand that if the same public key is retained in the enrollment certificate, the rest of the fields of the enrollment certificate are meaningless from a security perspective, so we can just ignore them when identifying which node send an authentication request.
Thank you, @tock-ibm and @yacovm, for your comments and discussion. The issue right now is that Raft and BFT has different logic on renewing the certificates of the consenters. On the other side, in BFT, TLS certs from the channel config don't seem to be used (correct me if I'm wrong), since I have a network with expired TLS certs, and BFT keeps working. At the end, the problem is that if I have 10 networks in BFT, I want to be able to renew the certificates without having to do 10 * x orderers config updates. I need this for my clients who are using BFT; one of them is a member of LFDT. |
That's impossible. The TLS certificate of the node needs to be valid, otherwise no one would be able to connect to it. Not orderers and not peers, not clients, etc. |
I mean TLS certs in the channel config are expired, not the certs used by the orderer. These are renewed using the previous keypair. |
David, @dviejokfs First of all, thank you for your effort and engagement. I appreciate your time and skill trying to make Fabric better! Indeed Raft and BFT differ in how things work. In Raft, the TLS certificate pinning in the config block is the basis of authenticating orderer nodes with each other. As @yacovm mentioned, in Raft, there were indeed some shortcuts made when renewing said certificates when the public key is the same. In BFT, although the *_tls_cert field still appear in the consenters map, it is unused (it is ignored and may be left empty, @yacovm correct me if I am wrong), and the
In BFT the enrollment cert is used both to sign blocks, and to establish authentication between orderer nodes. Peers, as well as third party auditors, watch the stream of blocks coming out of the ordering service. These blocks are signed by orderers. My position is that they can expect that the super majority of orderers signing the blocks to have valid certificates (at least 2f+1 of them). What the proposed change will do, is allow the ordering service to continue and emit signed blocks even when the signing certificates of the entire cluster - as reported in the config block - had expired. The peers and the auditors don't know about the "self renewals with same key" going on under the cover. All they see is the last config block. This is a downgrade of the fabric security model, in my opinion. It cannot be done without warning to all other network owners out there, that have fabric 3.x BFT networks deployed and expect it to work in a certain way. In addition, I speculate that a large part of them will want to keep the default strict model. In light of the lively discussion here, I propose to move this line of work to an RFC, and wait for the evaluation of additional maintainers and stake holders like @C0rWin @ale-linux @bestbeforetoday @andrew-coleman @denyeart @manish-sethi etc. Again, @dviejokfs I am highly appreciative of your contribution, but I think this issue deserve further evaluation. |
Let's continue with your version. 3-5 years have passed and the certificate has become invalid. And I connect a new peer to the channel, which starts downloading blocks starting from 0. The peer does not know anything about the fact that the certificates have already been updated. And the certificates that signed the blocks are already expired. What to do? |
In BFT, the peer actually does know the full sequence of enrollment cert renewals, because the config block is in the block chain. The block validation policy is rebuilt dynamically on every config block fetched, so the signatures on the block are evaluated according to the last config block that had preceded it. This way an auditor, even 5 years past the event, can evaluate the veracity of the blockchain. This code is already here in the BFT block puller and BFT synchronizer. |
That's not true. Which part of the PR makes you think that is the case? I don't think you understand the code changes made in the PR. |
The integration test where all certs are self renewed with same key? Just like what @dviejokfs reported is done with Raft - certs are renewed locally with same key, but not in the config block; right? I clarify: |
Yes, I know. But what does the expiration of the certificate have to do with it? The peer will be required to accept the block, even if it is signed with an expired certificate. |
You have a point in the sense that the fabric blockchain does not carry an explicit true sense of time. However, in real time, a peer can tell if the current ordering service network has expired certs defined in the last config block, correct? The peer is not required to check the cert expiry, But it can if it wants to. A blockchain explorer can. In my opinion the job of the channel admins is to keep the certs fresh in the config block. If you want to add a feature in which this is not required, maybe make it optional? |
I'm sorry, but I disagree with you. |
I went through the existing code again to study it further. You were saying: "I'm not against adding support for enrollment certificate expiration checks in the authenticated communication establishment path, but if you have a leaked enrollment key then you essentially can sign blocks on behalf of that orderer, right? So... the only way to protect against a key leaking is changing the public keys in the channel config anyway!" I actually think it is a good idea, the TLS handshake does it on the TLS certs, so why not here? Another point you made: "I have explained already that there is zero security benefit in keeping the same public key and renewing the certificate's validity period." You actually convinced me that keeping the same public key and renewing the certificate's validity has negative security impact, as it prolongs the vulnerability period of the private key, and is usually frowned upon by many security experts. "Therefore, you should (logically) either be in favor of banning certificate renewal with the same public key entirely (which I think should not be up to us but to the users who deploy Fabric) or ...." I am not in favor of banning it - there is a third option - document that decision by network admins in the blockchain, and have clients decide whether they want to do business with this chain. That is - do it in a config change. This way it is out in the open and not hidden in the local file system of each ordering node. And last, I have a question for @yacovm @dviejokfs: Again, I am highly appreciative of your work, @dviejokfs & @yacovm, and thanks for your patience. |
Thanks for the comments @yacovm and @tock-ibm I see valid points for both. The problem is not for a network that's self managed to modify the certificates. I have been doing that for a long time and it works. However to update the consenter certificates we need the Orderer Admin Policy to be met. The problem is in the operational aspect. Imagine we have 5 orderer organizations with 1 orderer each one. The amount of work to renew the orderers it's so much. This is a short-term solution; a long-term solution would be that organizations could change their certificates once they have consenter nodes, in the same way that a peer org can change their anchor peers without needing the Application Policy Admin to be met. |
You can still operate with expired enrollment certificates in the config block (or outside of it, actually) even now, before this change. What this change does, is simply makes it possible to continue operating if you have rotated your enrollment certificate (but didn't change the public key). The integration test doesn't even make the certificates expired, it just changes their validity time to make the certificate bytes change, that's all. |
With anchor peers, an organization can add any number of anchor peers it wants. We cannot allow an organization to freely add orderers to the channel config as it wants, as it can do that repeatedly until it controls a third or more orderer nodes and then BFT consistency can be compromised if the organization is malicious. The root cause of this problem is that the SmartBFT treats every node as its own party and has no notion of nodes belonging to parties. So if you have two nodes from the same organization, SmartBFT will consider them as two different votes in the protocol, even though the nodes have the same administrator. |
That's why I said not to add/remove consenter (in this case admin orderer policyt must be required) But each orderer org could modify THEIR consenter certificates so that networks with a lot of orderer organizations that are truly decentralized don't have a maintenance overhead every year. |
I am not sure what you mean regarding the TLS handshake.
But in the end of the day it is up to the organization that runs the node to decide the key rotation schedule of its nodes. We don't actually use any properties of the x509 standard for orderer enrollment certificates, so they should really be treated as public keys. Besides, enrollment certificates can be stored in HSM or in similar software solutions that prevent the private key from existing on disk, so an organization should be able to opt out of key rotation for its enrollment certificates if it wishes to.
This proposal has nothing to do with this PR. The current state of the code even without this PR makes it that enrollment certificate expiration in the config block is not taken into account. This PR doesn't change anything in that aspect. If you want to make a PR that makes Fabric even more complex to use, be my guest.
Even outside of Fabric, in every system with x509 based authentication, if a node has the private key of a certificate, then if the certificate the node has is expired, there are two mutually exclusive scenarios:
Having the private key but not a valid corresponding certificate should be treated as having a private key and a valid corresponding certificate, because a certificate is not a secret . And as me and @pfi79 explained already, certificate validity does not play a part in on-chain validation. So if you have stolen a private key, you can forge signatures of the orderer on past blocks, and certificate validity doesn't help here.
I think you're misunderstanding what CFT means. CFT doesn't mean an attacker doesn't try to impersonate a node. CFT just means the protocol assumes that a node that has the credentials (making it part of the system), doesn't deviate from the consensus protocol. But the authentication mechanisms still need to be in place, otherwise malicious actors outside of the system can impersonate nodes from within the system.
I might be wrong, but I don't think the config transaction framework lets you have a sub-tree of an organization that allows you only modify the certificates but not add new entries. At this point, the channel config framework is regarded as legacy code left to us by an ancient civilization and no one has the appetite to change it. |
Are you sure we don't use ANY of the features? the SmartBFT verifier or block validation policy will not verify the signature via the certs (from the config block of course), which also includes verifying the certification chain all the way to the CA? My question above was what happens in the auth flow here if the connecting client passes a cert that has the same pub key but is not signed by the correct CA. I have not received an answer... @dviejokfs this should be easy to check, just run the same integration test, but when you renew the enrollment certs, do it with another CA that is not in the config block (Only for enrollment!). I expect things to fail. |
Both in the authentication flow, and when validating block signatures, the certification chain from the certificate to the CA does not play any part. The reason is that the config block is the source of truth here. So, in other words, even in the existing code before this PR, if I am a malicious admin and I can modify the code of my orderer as I see fit, and I possess a private key of an enrollment certificate that its certificate is in the channel config, no one is going to check my certificate both for protocol messages and for block signatures. The reason is that the channel config is used as the source of truth for which certificates are valid, not the CA. The model where you verify a certificate's certification chain when you check its signature was invented to be used for leaf certificates that you do not trust and need to verify them somehow. But if you have the certificate in the config block, then you trust it because it's in the config block. If someone tells you it has a signature that is valid under the certificate from the config block there is no point in asking "oh yeah? then show me your certificate!" because the certificate is right there... in the config block. In the current code, if you take an orderer with a private key and a corresponding certificate that is in the channel config, and modify its certificate to be signed by a foreign CA but keep the same public key, the orderer will not start because it wouldn't find its certificate in the consenter set. However if you modify its code to start and use this certificate, then any other orderer that runs correct code will fully cooperate with this orderer and all peers will be able to validate blocks that are signed with signatures that one of them are from that orderer. |
A few follow-up questions:
|
The local MSP will fail if you initialize it with an expired certificate.
Nothing...
Only if you run a node "by the book". However as I said before, an adversary can always impersonate an orderer it has its private key. |
Type of change
Improvement
Description
This PR allows enrollment certificates of BFT orderers to be renewed without having to do a channel configuration update, as long as the renewed certificates have the same public key.
Related issues
No related issues
Release Note
Added support for seamless enrollment certificate renewal in orderer clusters when public keys remain unchanged. This prevents connection failures and NOT_FOUND errors during certificate rotation operations.