Cordoning Nodes and Draining Pods
This SOP should be followed in the following scenarios:
- If maintenance is scheduled to be carried out on an Openshift node.
Steps
- Mark the node as unschedulable:
nodes=$(oc get nodes -o name | sed -E "s/node\///")
echo $nodes
for node in ${nodes[@]}; do oc adm cordon $node; done
node/<node> cordoned
- Check that the node status is
NotReady,SchedulingDisabled
oc get node <node1>
NAME STATUS ROLES AGE VERSION
<node1> NotReady,SchedulingDisabled worker 1d v1.18.3
Note: It might not switch to NotReady
immediately, there maybe many pods still running.
- Evacuate the Pods from worker nodes using one of the following methods
This will drain node
<node1>
, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.
oc adm drain <node1> --delete-local-data=true --ignore-daemonsets=true --grace-period=60
-
Perform the scheduled maintenance on the node Do what ever is required in the scheduled maintenance window
-
Once the node is ready to be added back into the cluster We must uncordon the node. This allows it to be marked scheduleable once more.
nodes=$(oc get nodes -o name | sed -E "s/node\///")
echo $nodes
for node in ${nodes[@]}; do oc adm uncordon $node; done