Workload Diversity Guide

Service disruptions in Kubernetes are more likely when multiple pods from the same workload are scheduled on identical infrastructure components — such as the same node or the same instance type within a single AZ.

CloudPilot AI mitigates this risk by employing intelligent workload distribution strategies that automatically spread pods across diverse nodes and instance types. This improves both resilience and availability. The solution provides two key diversification mechanisms:

Node Diversity: Automatically distributes pods across multiple nodes to prevent node-level failures from affecting entire workloads
Instance Type Diversity: Schedules pods on different instance types to mitigate risks associated with instance-type-specific issues, particularly spot instance interruptions

This approach significantly improves application resilience. The system operates transparently through Kubernetes-native scheduling mechanisms and requires minimal configuration changes to existing workloads. You can adjust the configuration based on your specific requirements and operational needs.

Key Features

Node Diversity

If your workload has two or more replicas, CloudPilot AI provisions at least two nodes and distributes the pods across them. This feature is enabled by default.

Here is an example:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: karpenter.sh/capacity-type
                    operator: Exists
      containers:
      - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
        name: inflate
        resources:
          requests:
            cpu: 250m
            memory: 250Mi

Apply this YAML and check the pod nodes. You will see different nodes:


# kubectl get po -lapp=inflate -owide
NAME                       READY   STATUS    RESTARTS      AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
inflate-658b958b9f-57qvj   1/1     Running   0             10s   10.0.42.207   ip-10-0-33-24.us-east-2.compute.internal    <none>           <none>
inflate-658b958b9f-ljb7h   1/1     Running   0             10s   10.0.24.64    ip-10-0-21-104.us-east-2.compute.internal   <none>           <none>

Instance Type Diversity

The same spot instance type typically experiences interruptions at the same time within a given Availability Zone. To reduce the impact of such interruptions, CloudPilot AI allows you to distribute a workload’s pods (such as those in a Deployment or StatefulSet) across different instance types by applying the label workload.cloudpilot.ai/diversity-instancetype=true.

Here is an example:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
  labels:
    workload.cloudpilot.ai/diversity-instancetype: "true"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: karpenter.sh/capacity-type
                    operator: Exists
      containers:
      - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
        name: inflate
        resources:
          requests:
            cpu: 250m
            memory: 250Mi

Apply this YAML and check the pod nodes. You will see different instance types:


# kubectl get pods -lapp=inflate -o custom-columns="NAME:.metadata.name,NODE:.spec.nodeName,INSTANCE_TYPE:.spec.nodeName" | grep -v "NODE" | while read name node rest; do if [ ! -z "$node" ]; then instance_type=$(kubectl get node $node -o jsonpath='{.metadata.labels.node\.kubernetes\.io/instance-type}' 2>/dev/null); echo "$name -> $node -> $instance_type"; fi; done
inflate-658b958b9f-5v6t -> ip-10-0-3-179.us-east-2.compute.internal -> t2.medium
inflate-658b958b9f-9ock -> ip-10-0-20-81.us-east-2.compute.internal -> t3.large

How It Works

Overview

The core design of this solution is to move the control logic into a dedicated controller. This controller utilizes a webhook component that applies patches to the pod, signaling the scheduler to avoid scheduling workloads on the same node or instance type.

Controller: The controller runs every 30 seconds and evicts pods from a workload if multiple pods are scheduled on the same instance type. This behavior is enabled when the owning Deployment or StatefulSet has the label workload.cloudpilot.ai/diversity-instancetype=true.
Webhook: The webhook uses an informer to monitor the workloads’ pod distribution and patches the appropriate affinity/anti-affinity rules to control the scheduling result.

For configuration guidance or advanced use cases, please contact the CloudPilot AI support team.