Sometimes, single nodes in your AKS clusters might behave in a strange way. Usually, the self-healing capabilities of Kubernetes should detect that and replace the node. From time to time, you might decide to restart a single node manually.

As AKS node pools are organized in Azure Virtual Machine Scale Sets, restarting a node is as easy as restarting a Virtual Machine in a Scale Set.

What if a node can't be restarted?

It rarely happens, that a node got stuck and can't be restarted easily. You can try to deallocate and restart the node with the following commands:

# De-allocate the VM
az vmss deallocate -g MC_container-demos_rothieaks_westeurope -n aks-agentpool-94218126-vmss --instance-ids 2

# Start the deallocated VM again
az vmss start -g MC_container-demos_rothieaks_westeurope -n aks-agentpool-94218126-vmss --instance-ids 2

If that does not solve your problem, the last resort should be deleting the VM from the Scale Set. Removing VMs from a Scale set gets interpreted as scaling down the cluster by Azure. So Azure won't try to add a new VM to the Scale Set after you removed one.

# Remove VM from node pool scale set
az vmss start -g MC_container-demos_rothieaks_westeurope -n aks-agentpool-94218126-vmss --instance-ids 2

# Scale the AKS cluster back to its original size
az aks scale -n rothieaks -g container-demos --nodepool-name agentpool -c 3    

Auto-restart nodes for updates

In case you just want to restart the cluster nodes automatically when they need a reboot after an update, you should take a look at the Kured (KUbernetes REboot Daemon) project. Kured looks for the /var/run/reboot-required file on each node and restarts it, if this file is present.

Installing Kured is as easy as applying its DaemonSet with the following command:

kubectl apply -f https://github.com/weaveworks/kured/releases/download/1.2.0/kured-1.2.0-dockerhub.yaml