Node upgrades are a critical aspect of maintaining a healthy OpenShift cluster. Whether it’s applying security patches, updating underlying dependencies, or simply scaling up resources, the process must be executed with precision to avoid disruptions to running workloads.
OpenShift(Kubernetes) node upgrade methods often involve draining nodes, evacuating workloads, and performing the upgrade, leading to potential downtime and service interruptions. This can be particularly challenging in production environments where any disruption can have cascading effects on business operations.
Instead of upgrading all nodes simultaneously, a subset of nodes can be selected and upgraded, first.
In this post, I will walk you through, how to upgrade OCP nodes predictably, while some of workloads are running happily. This method can be useful when only particular set of nodes upgrade desired. Please note in that type of OCP upgrade, node restarts are still inevitable. Nevertheless, it could be useful when you need to postpone upgrading some of nodes and applications working on those nodes for sometime.
Key feature that gives us this capability in OCP is Machine Config Pool.
What is Machine Config Pool?
What is Role Binding to a User/Group, Machine Config Pool is to Nodes. It associates Nodes with Machine Configs. For more information about Machine Config Pool, you can follow, the post.
OCP Nodes in Sandbox2
To demonstrate Canary like OCP node upgrade, following cluster will be used. Current version of OCP Sandbox2 cluster version is 4.12.10 and it will be upgraded to the OCP 4.14.10.
Note: To find upgrade path you can use the Red Hat OCP Update Graph application.
According to the Application, OCP versions that we need to follow as below.
4.12.10(Current version) -> 4.12.47 -> 4.13.19 -> 4.14.10
$ oc get nodes
NAME STATUS ROLES AGE VERSION
sandbox2-dmzworker0 Ready dmzworker,worker 11d v1.25.16+5c97f5b
sandbox2-dmzworker1 Ready dmzworker,worker 11d v1.25.16+5c97f5b
sandbox2-infra0 Ready infra,worker 35d v1.25.16+5c97f5b
sandbox2-infra1 Ready infra,worker 35d v1.25.16+5c97f5b
sandbox2-master0 Ready control-plane,master 35d v1.25.16+5c97f5b
sandbox2-master1 Ready control-plane,master 35d v1.25.16+5c97f5b
sandbox2-master2 Ready control-plane,master 35d v1.25.16+5c97f5b
sandbox2-worker0 Ready worker 35d v1.25.16+5c97f5b
sandbox2-worker1 Ready worker 35d v1.25.16+5c97f5b
You can see the Node <-> MCP relation of sandbox2 cluster in the following table.
N0de Name | MCP Name |
sandbox2-worker0 | worker |
sandbox2-worker1 | worker |
sandbox2-master0 | master |
sandbox2-master1 | master |
sandbox2-master2 | master |
sandbox2-dmzworker0 | dmzworker |
sandbox2-dmzworker1 | dmzworker |
sandbox2-infra0 | infra |
sandbox2-infra1 | infra |
Machine Config Pools in the Sandbox2 Cluster
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
dmzworker rendered-dmzworker-4b77d58a5da76d5a8126d186460a28a6 True False False 2 2 2 0 11d
infra rendered-infra-4b77d58a5da76d5a8126d186460a28a6 True False False 2 2 2 0 35d
master rendered-master-ddad513671757f22d78d41940ab0255c True False False 3 3 3 0 35d
worker rendered-worker-4b77d58a5da76d5a8126d186460a28a6 True False False 2 2 2 0 35d
Before Update/Upgrade of the Cluster, it is required to create additional MCPs and join some of nodes to these new MCPs.
Create Additional MCP
In this section additional three MPCs will be created and some of nodes will be added to these new MCPs.
infra-canary
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra-canary
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
maxUnavailable: 1
nodeSelector:
matchExpressions:
- {key: node-role.kubernetes.io/infra-canary, operator: Exists}
paused: false
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: dmzworker-canary
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,dmzworker]}
maxUnavailable: 1
nodeSelector:
matchExpressions:
- {key: node-role.kubernetes.io/dmzworker-canary, operator: Exists}
paused: false
worker-canary
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-canary
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker]}
maxUnavailable: 1
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-canary: ""
paused: false
Update existing MCP
Existing MCPs will be updated with “matchExpressions” selector to disjoin desired nodes from its original MCPs.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: dmzworker
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,dmzworker]}
maxUnavailable: 1
nodeSelector:
matchExpressions:
- {key: node-role.kubernetes.io/dmzworker-canary, operator: DoesNotExist}
matchLabels:
node-role.kubernetes.io/dmzworker: ""
paused: false
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
maxUnavailable: 1
nodeSelector:
matchExpressions:
- {key: node-role.kubernetes.io/infra-canary, operator: DoesNotExist}
matchLabels:
node-role.kubernetes.io/infra: ""
paused: false
Placing nodes to new MCPs
To place desired nodes to its new MCPs, additional label needs to be added to the nodes. In this post, nodes with odd number will be re-joined to the new MCPs.
$ oc label node sandbox2-infra1 node-role.kubernetes.io/infra-canary=""
$ oc label node sandbox2-dmzworker1 node-role.kubernetes.io/dmzworker-canary=""
$ oc label node sandbox2-worker1 node-role.kubernetes.io/worker-canary=""
After adding label to the nodes, Node <-> MCP membership should be as in the table below.
N0de Name | MCP Name |
sandbox2-worker0 | worker |
sandbox2-worker1 | worker-canary |
sandbox2-master0 | master |
sandbox2-master1 | master |
sandbox2-master2 | master |
sandbox2-dmzworker0 | dmzworker |
sandbox2-dmzworker1 | dmzworker-canary |
sandbox2-infra0 | infra |
sandbox2-infra1 | infra-canary |
Note: No special configuration applied for control-plane nodes, as only one node will be unavailable at a time during the upgrade of OCP cluster.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
sandbox2-dmzworker0 Ready dmzworker,worker 3d4h v1.26.12+9ed7eae
sandbox2-dmzworker1 Ready dmzworker,dmzworker-canary,worker 3d4h v1.25.7+eab9cc9
sandbox2-infra0 Ready infra,worker 3d4h v1.26.12+9ed7eae
sandbox2-infra1 Ready infra,infra-canary,worker 3d4h v1.25.7+eab9cc9
sandbox2-master0 Ready control-plane,master 3d5h v1.26.12+9ed7eae
sandbox2-master1 Ready control-plane,master 3d5h v1.26.12+9ed7eae
sandbox2-master2 Ready control-plane,master 3d5h v1.26.12+9ed7eae
sandbox2-worker0 Ready worker 3d5h v1.26.12+9ed7eae
sandbox2-worker1 Ready worker,worker-canary 3d5h v1.25.7+eab9cc9
$ oc get mcp -o='jsonpath={range .items[*]}{.metadata.name} "------------->"{.spec.paused}{"\n"}'
dmzworker "------------->"false
dmzworker-canary "------------->"false
infra "------------->"false
infra-canary "------------->"false
master "------------->"false
worker "------------->"false
worker-canary "------------->"false
Pause <node-role>-canary MCPs
This is the last and highly important step before the Update/Upgrade of the cluster. In this step, <node-role>-canary MCPs paused set to true. With that, no cluster update/upgrade will take place on these nodes. Consequently, workloads running on these nodes will not be affected.
Note: Please do not forget to check deprecated/removed feature before OpenShift Upgrade, otherwise running applications can be affected due to removed API versions/features.
According to following setting any update/upgrade <node-role>-canary will not be affected for nodes member of canary MCPs. (paused: true)
$ oc patch mcp/worker-canary --patch '{"spec":{"paused":true}}' --type=merge
$ oc patch mcp/infra-canary --patch '{"spec":{"paused":true}}' --type=merge
$ oc patch mcp/dmzworker-canary --patch '{"spec":{"paused":true}}' --type=merge
$ oc get mcp -o='jsonpath={range .items[*]}{.metadata.name} "------------->"{.spec.paused}{"\n"}'
dmzworker "------------->"false
dmzworker-canary "------------->"true
infra "------------->"false
infra-canary "------------->"true
master "------------->"false
worker "------------->"false
worker-canary "------------->"true
RHOCP Update (4.12.10 to 4.12.47)
Before the update/upgrade of OpenShift, required OpenShift version images are mirrored to container registry. You can follow the guide how to mirror image for disconnected update/upgrade of OpenShift, here.
$ oc apply -f scripts/signatures/rhocp-release-4.12.47.json
$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:fcc9920ba10ebb02c69bdd9cd597273260eeec1b22e9ef9986a47f4874a21253
RHOCP Upgrade (4.12.47 to 4.13.29)
$ oc apply -f scripts/signatures/rhocp-release-4.13.29.json
$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:9c4a4471bb93ab11d255925535ff719742cafa8ae06d622b870133787a72abc3
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.12-kube-1.26-api-removals-in-4.13":"true"}}' --type=merge
RHOCP Upgrade (4.13.29 to 4.14.10)
$ oc apply -f scripts/signatures/rhocp-release-4.14.10.json
$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:03cc63c0c48b2416889e9ee53f2efc2c940323c15f08384b439c00de8e66e8aa
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge
As you can see all Nodes which where member of existing MCPs are Updated/Upgraded. But member of <node-role>-canary nodes are still the same version.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
sandbox2-dmzworker0 Ready dmzworker,worker 3d5h v1.27.9+e36e183
sandbox2-dmzworker1 Ready dmzworker,dmzworker-canary,worker 3d5h v1.25.7+eab9cc9
sandbox2-infra0 Ready infra,worker 3d5h v1.27.9+e36e183
sandbox2-infra1 Ready infra,infra-canary,worker 3d5h v1.25.7+eab9cc9
sandbox2-master0 Ready control-plane,master 3d6h v1.27.9+e36e183
sandbox2-master1 Ready control-plane,master 3d6h v1.27.9+e36e183
sandbox2-master2 Ready control-plane,master 3d6h v1.27.9+e36e183
sandbox2-worker0 Ready worker 3d6h v1.27.9+e36e183
sandbox2-worker1 Ready worker,worker-canary 3d6h v1.25.7+eab9cc9
Upgrade Nodes in <node-role>-canary MCP
There are two ways to upgrade paused nodes. (Nodes which have not been touched yet).
Set paused: false to new MCPs(<node-role>-canary) OR remove label node-role.kubernetes.io/<node-role>-canary on node.
Upgrade Nodes in infra-canary MCP(First Method)
Set paused: false which upgrade all nodes which member of “infra-canary” MCP.
$ oc patch mcp/infra-canary --patch '{"spec":{"paused":false}}' --type=merge
$ oc get nodes
NAME STATUS ROLES AGE VERSION
sandbox2-dmzworker0 Ready dmzworker,worker 3d6h v1.27.9+e36e183
sandbox2-dmzworker1 Ready dmzworker,dmzworker-canary,worker 3d6h v1.25.7+eab9cc9
sandbox2-infra0 Ready infra,worker 3d6h v1.27.9+e36e183
sandbox2-infra1 Ready infra,infra-canary,worker 3d6h v1.27.9+e36e183
sandbox2-master0 Ready control-plane,master 3d7h v1.27.9+e36e183
sandbox2-master1 Ready control-plane,master 3d7h v1.27.9+e36e183
sandbox2-master2 Ready control-plane,master 3d7h v1.27.9+e36e183
sandbox2-worker0 Ready worker 3d6h v1.27.9+e36e183
sandbox2-worker1 Ready worker,worker-canary 3d6h v1.25.7+eab9cc9
Upgrade Nodes in dmzworker-canary MCP (Second Method)
In this method label node-role.kubernetes.io/dmzworker-canary label will be removed and node dmzworker1 will rejoin to dmzworker MCP.
$ oc label node sandbox2-dmzworker1 node-role.kubernetes.io/dmzworker-canary-
node/sandbox2-dmzworker1 unlabeled
$ oc get mcp dmzworker-a
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
dmzworker-canary rendered-dmzworker-canary-fce74906989201f3653152f435f3a9e3 True False False 0 0 0 0 3d3h
$ oc get mcp dmzworker
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
dmzworker rendered-dmzworker-fce74906989201f3653152f435f3a9e3 False True False 2 1 1 0 3d6h
Upgrade Nodes in worker-canary MCP (First Method)
$ oc patch mcp/worker-canary --patch '{"spec":{"paused":false}}' --type=merge
Leave a Reply