Optimizing Network Traffic across Availability Zones in Kubernetes
Spanning Kubernetes Clusters across multiple Availability Zones is common when optimizing for resiliency but brings additional challenges like network performance and costs when workloads need to communicate with each other across zones.
Spanning Kubernetes Clusters across multiple Availability Zones is common when optimizing for resiliency but brings additional challenges like network performance and costs when workloads need to communicate with each other across zones. Here are some tweaks to optimize network traffic in those scenarios.
By itself, Kubernetes does not include zone-aware networking.
- Kubernetes Documentation
Keep inter-workload communication in the same zone if possible
When two or more pods communicate with each other, they typically do that by calling a Service Resource that balances incoming traffic across their replicas. By default, when hitting a Service, it routes you to a random instance of the workload the Service is pointing at - not necessarily the closest one or one in the same zone. This could lead to an effect where traffic between Pods could happen across zones (and cause additional costs) although a replica in the same zone would have been available as well.
To make a Service "zone aware" when balancing traffic, it can be extended by Topology-aware traffic routing Topology Keys to instruct the Traffic to prefer Pods in the same node, then check the same zone and only if absolutely required send traffic across zones.
Hint: Topology Keys are deprecated since Kubernetes v1.21 and will be replaced by Topology Aware Hints with similar functionality.
Place those workloads close together that often communicate with each other
Often, you have workloads in your cluster that are communicating more frequently with each other than others. It usually is a good idea to keep instances of those services close to each other, meaning on the same node or in the same zone.
In a zone redundant setup it is still recommended to have multiple replicas of a workload spread across zones, we should ensure, that those groups of workloads that communicate frequently with each other are located in the same zone and exist multiple times in additional zones.
Inter-pod affinity can help to co-locate a Pod in a Zone that already has another Pod which matches a specific criteria scheduled to it.