When creating an Azure Kubernetes Service (AKS) cluster in Terraform using the official aurerm_kubernetes_service resource, we need to create a default_node_pool block. Unfortunately, changing properties like the vm_size in this block forces the cluster to be re-created. Let's find out why and how we can work around that without modifying the Terraform state.

What is the default node pool?

It is important to understand, that AKS itself does not have the concept of a "Default Node Pool", it only exists in Terraform. Let's explore why!

First, it's important to understand, that AKS needs at least one "System Node Pool" of Mode=System at all times. Terraform manages node pools in a dedicated azurerm_kubernetes_cluster_node_pool resource. The problem with that concept is, that Terraform resolves resource dependencies, which will result in creating the cluster first and then attach the node pools to it. But remember, a cluster can't be created without at least one node pool. So we would run into a chicken-egg-problem, where Terraform couldn't figure out whether to create the cluster or the node pools first.

This is why Terraform added the default_node_pool block. It represents a System Node Pool that gets created together with the cluster, to meet the requirement from above.

Why do we have to re-create the cluster when changing default node pool properties?

In Azure, you can't change node pool properties. If you need a different node pool configuration, you need to add a new node pool, migrate the resources from the old node pool over to the new one and then delete the old node pool. This is something Terraform can't do for us. It also can't delete and re-create the default node pool, because when deleting it, it would potentially violate the rile that AKS needs at least one "System Node Pool" at all times.

So every change in the default_node_pool block results in the full re-creation of the cluster. But wait, aren't there any other options?

How can we modify the default node pool without re-creating the cluster?

If you can't afford losing the whole cluster just because of a change on the "Default Node Pool", we change the node pool manually as described above and trick Terraform into not noticing the change.

Let's assume, you start with a default_node_pool like this and want to change the vm_size from Standard_DS2_v2 to Standard_B2s without letting Terraform re-create the whole cluster.

resource "azurerm_kubernetes_cluster" "default" {
  # ...

  default_node_pool {
    name       = "default"
    vm_size    = "Standard_DS2_v2"
  }
}
AKS definition in Terraform before the change

For that, let's take a look at how Terraform keeps track of its "Default Node Pool". By looking into the state, we can see, that the azurerm_kubernetes_cluster resource has a default_node_pool property which only identifies the according node pool by its name. There is no Azure Resource ID involved.

"default_node_pool": [
  {
    "availability_zones": [],
    "enable_auto_scaling": false,
    "enable_node_public_ip": false,
    "max_count": 0,
    "max_pods": 110,
    "min_count": 0,
    "name": "default",
    "node_count": 3,
    "node_labels": {},
    "node_taints": [],
    "orchestrator_version": "1.18.8",
    "os_disk_size_gb": 128,
    "tags": {},
    "type": "VirtualMachineScaleSets",
    "vm_size": "Standard_DS2_v2",
    "vnet_subnet_id": ""
  }
]
Snippet from terraform.tfstate file

As long as there is a node pool assigned to that AKS cluster with the correct name and the mode set to "System", Terraform will use it as its default_node_pool.

Earlier, we learned that to modify a node pool in Azure, we have to recreate it. At the same time, we always need to have at least one "System Node Pool". So let's start with creating a temporary "System Node Pool" and migrating all workloads from the old "Default Node Pool" over to that new one. We don't use Terraform for this, but the Azure CLI, because we don't want Terraform to keep track of these operations.

az aks nodepool add \
  --cluster-name aks \
  --resource-group myrg \
  --name temp \
  --mode "System"
Create a temporary System Node Pool via CLI

Once the new "System Node Pool" has been created, we can delete the old "Default Node Pool". This will migrate all workloads that needs to run on "System Node Pools" over to the new one.

Warning: This will cause running applications and workloads on the "Default Node Pool" to stop and migrate over to another node before they get restarted.
az aks nodepool delete \
  --cluster-name aks \
  --resource-group myrg \
  --name default
Delete the original "Default Node Pool" via CLI

Now we can create a new node pool that we want to use as the "Default Node Pool" in Terraform with the updated properties. For this sample, we assume, that we want to change the VM size to Standard_B2s.

az aks nodepool add \
  --cluster-name aks \
  --resource-group myrg \
  --name default \
  --mode "System" \
  --node-vm-size "Standard_B2s"
Create a new "Default Node Pool" via CLI

We can now delete the temporary "System Node Pool", to migrate all the workloads back the the new "Default Node Pool".

az aks nodepool delete \
  --cluster-name aks \
  --resource-group myrg \
  --name temp
Delete the temporary "System Node Pool" via CLI

As a last step, we need to update the default_node_pool property of our azurerm_kubernetes_cluster resource in our Terraform file to adopt the new change vm_size change.

resource "azurerm_kubernetes_cluster" "default" {
  # ...

  default_node_pool {
    name       = "default"
    vm_size    = "Standard_B2s"
  }
}
AKS definition in Terraform after the change

When running terraform plan now, Terraform doesn't notice any changes, and there should be no need to do anything from the Terraform side and you can continue to work with the cluster the normal way.


☝️ Advertisement Block: I will buy myself a pizza every time I make enough money with these ads to do so. So please feed a hungry developer and consider disabling your Ad Blocker.