Kubernetes uses mTLS for service components to communicate.
Certificates are difficult and automatically generating them is one of the key features of Cloud Foundry Container Runtime(CFCR).
There is one thing though. The default certificate duration in CFCR is one year, so if you have deployed the cluster last year, it is time to rotate the CA certificates.
Now, here is the typical way, the CA certificates are rotated. First, you generate a new certificate, then you add it to the chain of trust. Then you generate new certificates and start using them. After this, you can remove the old certificate from the trusted chain.
So, how to do this in CFCR?
First, you need to know how CA certificates values are set.
The obvious answer is the property with the name. For example, the kubelet job uses the property to set the API certificate. In the manifest, it can be passed as a single high-level property or split into three properties. In the release, usually, only the high-level property is referenced.
Another way to pass properties is via links. The kube-apiserver is the main component that all other components communicate. Almost two years ago we decided to share it as a link so that other components can consume all the required data automatically. Etcd release uses links the same way.
Now, when you know all the ways the certificates are passed, we can proceed to the minimum downtime upgrade.
How to do this for CFCR
- Modify manifest to add new CA certificates. For example, you can duplicate these rows and add the certificate with name kubo_ca_2019
- Find all the certificates that use given ca. Then find all the reference in the manifest and add the second ca into the manifest.
For example, this line should look like
Also, since the certificate in the manifest is just an object with three properties, some places hide these properties and reference directly to the variable. You will have to expand those variables and add individual properties for each key.
- Find all the links and reconfigure them manually with adding additional CA certificate where it is required.
- Deploy CFCR with the new manifest. This will restart all the VMs and might cause workload downtime.
- Now, you should change CA for all the certificates in the manifest. To do this, change the CA value for each of the certificate to the new one and enable converging variables in the manifest.
- Deploy CFCR with the new manifest. This will restart all the jobs and might lead to workload downtime.
- Now, you can delete old CA from the manifest. You can also optionally, rename certificate in CredHub to the original value (kubo_ca_2019 to kubo_ca) and revert manifest to the original way.
- Last redeploy will delete old trusted certificates and will restart all jobs as well.
Ideally, this process should be automated and happen every month. Or alternatively, you can upgrade certificates step by step with each stemcell upgrade. This way you will not have additional downtime.