manintheit.org

manintheit.org


OCP Upgrade with Canary Rollout Strategy

Node upgrades are a critical aspect of maintaining a healthy OpenShift cluster. Whether it’s applying security patches, updating underlying dependencies, or simply scaling up resources, the process must be executed with precision to avoid disruptions to running workloads.

OpenShift(Kubernetes) node upgrade methods often involve draining nodes, evacuating workloads, and performing the upgrade, leading to potential downtime and service interruptions. This can be particularly challenging in production environments where any disruption can have cascading effects on business operations.

Instead of upgrading all nodes simultaneously, a subset of nodes can be selected and upgraded, first.

In this post, I will walk you through, how to upgrade OCP nodes predictably, while some of workloads are running happily. This method can be useful when only particular set of nodes upgrade desired. Please note in that type of OCP upgrade, node restarts are still inevitable. Nevertheless, it could be useful when you need to postpone upgrading some of nodes and applications working on those nodes for sometime.

Key feature that gives us this capability in OCP is Machine Config Pool.

What is Machine Config Pool?

What is Role Binding to a User/Group, Machine Config Pool is to Nodes. It associates Nodes with Machine Configs. For more information about Machine Config Pool, you can follow, the post.

OCP Nodes in Sandbox2

To demonstrate Canary like OCP node upgrade, following cluster will be used. Current version of OCP Sandbox2 cluster version is 4.12.10 and it will be upgraded to the OCP 4.14.10.

Note: To find upgrade path you can use the Red Hat OCP Update Graph application.

According to the Application, OCP versions that we need to follow as below.

4.12.10(Current version) -> 4.12.47 -> 4.13.19 -> 4.14.10

$ oc get nodes
NAME                  STATUS   ROLES                  AGE   VERSION
sandbox2-dmzworker0   Ready    dmzworker,worker       11d   v1.25.16+5c97f5b
sandbox2-dmzworker1   Ready    dmzworker,worker       11d   v1.25.16+5c97f5b
sandbox2-infra0       Ready    infra,worker           35d   v1.25.16+5c97f5b
sandbox2-infra1       Ready    infra,worker           35d   v1.25.16+5c97f5b
sandbox2-master0      Ready    control-plane,master   35d   v1.25.16+5c97f5b
sandbox2-master1      Ready    control-plane,master   35d   v1.25.16+5c97f5b
sandbox2-master2      Ready    control-plane,master   35d   v1.25.16+5c97f5b
sandbox2-worker0      Ready    worker                 35d   v1.25.16+5c97f5b
sandbox2-worker1      Ready    worker                 35d   v1.25.16+5c97f5b

You can see the Node <-> MCP relation of sandbox2 cluster in the following table.

N0de NameMCP Name
sandbox2-worker0worker
sandbox2-worker1worker
sandbox2-master0master
sandbox2-master1master
sandbox2-master2master
sandbox2-dmzworker0dmzworker
sandbox2-dmzworker1dmzworker
sandbox2-infra0infra
sandbox2-infra1infra

Machine Config Pools in the Sandbox2 Cluster

$ oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
dmzworker   rendered-dmzworker-4b77d58a5da76d5a8126d186460a28a6   True      False      False      2              2                   2                     0                      11d
infra       rendered-infra-4b77d58a5da76d5a8126d186460a28a6       True      False      False      2              2                   2                     0                      35d
master      rendered-master-ddad513671757f22d78d41940ab0255c      True      False      False      3              3                   3                     0                      35d
worker      rendered-worker-4b77d58a5da76d5a8126d186460a28a6      True      False      False      2              2                   2                     0                      35d

Before Update/Upgrade of the Cluster, it is required to create additional MCPs and join some of nodes to these new MCPs.

Create Additional MCP

In this section additional three MPCs will be created and some of nodes will be added to these new MCPs.

infra-canary

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  maxUnavailable: 1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/infra-canary, operator: Exists}
  paused: false

dmzworker-canary

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: dmzworker-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,dmzworker]}
  maxUnavailable: 1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/dmzworker-canary, operator: Exists}
  paused: false

worker-canary

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker]}
  maxUnavailable: 1
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-canary: ""
  paused: false

Update existing MCP

Existing MCPs will be updated with “matchExpressions” selector to disjoin desired nodes from its original MCPs.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: dmzworker
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,dmzworker]}
  maxUnavailable: 1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/dmzworker-canary, operator: DoesNotExist}
    matchLabels:
      node-role.kubernetes.io/dmzworker: ""
  paused: false
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  maxUnavailable: 1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/infra-canary, operator: DoesNotExist}
    matchLabels:
      node-role.kubernetes.io/infra: ""
  paused: false

Placing nodes to new MCPs

To place desired nodes to its new MCPs, additional label needs to be added to the nodes. In this post, nodes with odd number will be re-joined to the new MCPs.

$ oc label node sandbox2-infra1 node-role.kubernetes.io/infra-canary=""

$ oc label node sandbox2-dmzworker1 node-role.kubernetes.io/dmzworker-canary=""

$ oc label node sandbox2-worker1 node-role.kubernetes.io/worker-canary=""

After adding label to the nodes, Node <-> MCP membership should be as in the table below.

N0de NameMCP Name
sandbox2-worker0worker
sandbox2-worker1worker-canary
sandbox2-master0master
sandbox2-master1master
sandbox2-master2master
sandbox2-dmzworker0dmzworker
sandbox2-dmzworker1dmzworker-canary
sandbox2-infra0infra
sandbox2-infra1infra-canary

Note: No special configuration applied for control-plane nodes, as only one node will be unavailable at a time during the upgrade of OCP cluster.

$ oc get nodes
NAME                  STATUS   ROLES                          AGE    VERSION
sandbox2-dmzworker0   Ready    dmzworker,worker               3d4h   v1.26.12+9ed7eae
sandbox2-dmzworker1   Ready    dmzworker,dmzworker-canary,worker   3d4h   v1.25.7+eab9cc9
sandbox2-infra0       Ready    infra,worker                   3d4h   v1.26.12+9ed7eae
sandbox2-infra1       Ready    infra,infra-canary,worker           3d4h   v1.25.7+eab9cc9
sandbox2-master0      Ready    control-plane,master           3d5h   v1.26.12+9ed7eae
sandbox2-master1      Ready    control-plane,master           3d5h   v1.26.12+9ed7eae
sandbox2-master2      Ready    control-plane,master           3d5h   v1.26.12+9ed7eae
sandbox2-worker0      Ready    worker                         3d5h   v1.26.12+9ed7eae
sandbox2-worker1      Ready    worker,worker-canary                3d5h   v1.25.7+eab9cc9

$ oc get mcp  -o='jsonpath={range .items[*]}{.metadata.name} "------------->"{.spec.paused}{"\n"}'
dmzworker "------------->"false
dmzworker-canary "------------->"false
infra "------------->"false
infra-canary "------------->"false
master "------------->"false
worker "------------->"false
worker-canary "------------->"false

Pause <node-role>-canary MCPs

This is the last and highly important step before the Update/Upgrade of the cluster. In this step, <node-role>-canary MCPs paused set to true. With that, no cluster update/upgrade will take place on these nodes. Consequently, workloads running on these nodes will not be affected.

Note: Please do not forget to check deprecated/removed feature before OpenShift Upgrade, otherwise running applications can be affected due to removed API versions/features.

According to following setting any update/upgrade <node-role>-canary will not be affected for nodes member of canary MCPs. (paused: true)

$ oc patch mcp/worker-canary --patch '{"spec":{"paused":true}}' --type=merge
$ oc patch mcp/infra-canary --patch '{"spec":{"paused":true}}' --type=merge
$ oc patch mcp/dmzworker-canary  --patch '{"spec":{"paused":true}}' --type=merge


$ oc get mcp  -o='jsonpath={range .items[*]}{.metadata.name} "------------->"{.spec.paused}{"\n"}'
dmzworker "------------->"false
dmzworker-canary "------------->"true
infra "------------->"false
infra-canary "------------->"true
master "------------->"false
worker "------------->"false
worker-canary "------------->"true

RHOCP Update (4.12.10 to 4.12.47)

Before the update/upgrade of OpenShift, required OpenShift version images are mirrored to container registry. You can follow the guide how to mirror image for disconnected update/upgrade of OpenShift, here.

$ oc apply -f scripts/signatures/rhocp-release-4.12.47.json

$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:fcc9920ba10ebb02c69bdd9cd597273260eeec1b22e9ef9986a47f4874a21253

RHOCP Upgrade (4.12.47 to 4.13.29)

$ oc apply -f scripts/signatures/rhocp-release-4.13.29.json

$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:9c4a4471bb93ab11d255925535ff719742cafa8ae06d622b870133787a72abc3

$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.12-kube-1.26-api-removals-in-4.13":"true"}}' --type=merge

RHOCP Upgrade (4.13.29 to 4.14.10)

$ oc apply -f scripts/signatures/rhocp-release-4.14.10.json

$ oc adm upgrade --allow-explicit-upgrade --to-image registry.local.io/openshift/openshift-release-dev@sha256:03cc63c0c48b2416889e9ee53f2efc2c940323c15f08384b439c00de8e66e8aa

$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge

As you can see all Nodes which where member of existing MCPs are Updated/Upgraded. But member of <node-role>-canary nodes are still the same version.

$ oc get nodes
NAME                  STATUS   ROLES                          AGE    VERSION
sandbox2-dmzworker0   Ready    dmzworker,worker               3d5h   v1.27.9+e36e183
sandbox2-dmzworker1   Ready    dmzworker,dmzworker-canary,worker   3d5h   v1.25.7+eab9cc9
sandbox2-infra0       Ready    infra,worker                   3d5h   v1.27.9+e36e183
sandbox2-infra1       Ready    infra,infra-canary,worker           3d5h   v1.25.7+eab9cc9
sandbox2-master0      Ready    control-plane,master           3d6h   v1.27.9+e36e183
sandbox2-master1      Ready    control-plane,master           3d6h   v1.27.9+e36e183
sandbox2-master2      Ready    control-plane,master           3d6h   v1.27.9+e36e183
sandbox2-worker0      Ready    worker                         3d6h   v1.27.9+e36e183
sandbox2-worker1      Ready    worker,worker-canary               3d6h   v1.25.7+eab9cc9

Upgrade Nodes in <node-role>-canary MCP

There are two ways to upgrade paused nodes. (Nodes which have not been touched yet).

Set paused: false to new MCPs(<node-role>-canary) OR remove  label node-role.kubernetes.io/<node-role>-canary on node.

Upgrade Nodes in infra-canary MCP(First Method)

Set paused: false which upgrade all nodes which member of “infra-canary” MCP.

$ oc patch mcp/infra-canary --patch '{"spec":{"paused":false}}' --type=merge
$ oc get nodes
NAME                  STATUS   ROLES                          AGE    VERSION
sandbox2-dmzworker0   Ready    dmzworker,worker               3d6h   v1.27.9+e36e183
sandbox2-dmzworker1   Ready    dmzworker,dmzworker-canary,worker   3d6h   v1.25.7+eab9cc9
sandbox2-infra0       Ready    infra,worker                   3d6h   v1.27.9+e36e183
sandbox2-infra1       Ready    infra,infra-canary,worker           3d6h   v1.27.9+e36e183
sandbox2-master0      Ready    control-plane,master           3d7h   v1.27.9+e36e183
sandbox2-master1      Ready    control-plane,master           3d7h   v1.27.9+e36e183
sandbox2-master2      Ready    control-plane,master           3d7h   v1.27.9+e36e183
sandbox2-worker0      Ready    worker                         3d6h   v1.27.9+e36e183
sandbox2-worker1      Ready    worker,worker-canary                3d6h   v1.25.7+eab9cc9

Upgrade Nodes in dmzworker-canary MCP (Second Method)

In this method label node-role.kubernetes.io/dmzworker-canary label will be removed and node dmzworker1 will rejoin to dmzworker MCP.

$ oc label node sandbox2-dmzworker1 node-role.kubernetes.io/dmzworker-canary-
node/sandbox2-dmzworker1 unlabeled

$ oc get mcp dmzworker-a
NAME          CONFIG                                                  UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
dmzworker-canary   rendered-dmzworker-canary-fce74906989201f3653152f435f3a9e3   True      False      False      0              0                   0                     0                      3d3h
$ oc get mcp dmzworker
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
dmzworker   rendered-dmzworker-fce74906989201f3653152f435f3a9e3   False     True       False      2              1                   1                     0                      3d6h

Upgrade Nodes in  worker-canary MCP (First Method)

$ oc patch mcp/worker-canary --patch '{"spec":{"paused":false}}' --type=merge


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.