Skip to Content
GuideRebalance ConfigurationNode Template Configuration

Node Template Configuration

This guide explains how to configure node templates in CloudPilot AI. Node templates define the properties of nodes that CloudPilot AI creates and manages. Each template consists of two parts:

  • Node Class Configuration — Defines cloud provider–specific settings and node-level behavior, such as VM instance tags, disk size, networking, and image selection.
  • Node Pool Configuration — Specifies constraints and scheduling rules for node creation and pod placement, including node labels, taints, and allowed instance types.

These configurations ensure that node provisioning aligns with your cluster’s architecture, compliance requirements, and workload characteristics.

AWS Provider Node Class Configuration

Below is an example configuration for defining a Node Class for AWS:

kubelet: podsPerCore: 2 maxPods: 20 systemReserved: cpu: 100m memory: 100Mi ephemeral-storage: 1Gi kubeReserved: cpu: 200m memory: 100Mi ephemeral-storage: 3Gi evictionHard: memory.available: 5% nodefs.available: 10% nodefs.inodesFree: 10% evictionSoft: memory.available: 500Mi nodefs.available: 15% nodefs.inodesFree: 15% evictionSoftGracePeriod: memory.available: 1m nodefs.available: 1m30s nodefs.inodesFree: 2m evictionMaxPodGracePeriod: 60 imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 cpuCFSQuota: true clusterDNS: ["10.0.1.100"] # Required, discovers subnets to attach to instances # Each term in the array of subnetSelectorTerms is ORed together # Within a single term, all conditions are ANDed subnetSelectorTerms: # Select on any subnet that has the "karpenter.sh/discovery: ${CLUSTER_NAME}" # AND the "environment: test" tag OR any subnet with ID "subnet-09fa4a0a8f233a921" - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" environment: test - id: subnet-09fa4a0a8f233a921 # Required, discovers security groups to attach to instances # Each term in the array of securityGroupSelectorTerms is ORed together # Within a single term, all conditions are ANDed securityGroupSelectorTerms: # Select on any security group that has both the "karpenter.sh/discovery: ${CLUSTER_NAME}" tag # AND the "environment: test" tag OR any security group with the "my-security-group" name # OR any security group with ID "sg-063d7acfb4b06c82c" - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" environment: test - name: my-security-group - id: sg-063d7acfb4b06c82c # Optional, IAM role to use for the node identity. # The "role" field is immutable after EC2NodeClass creation. This may change in the # future, but this restriction is currently in place today to ensure that Karpenter # avoids leaking managed instance profiles in your account. # Must specify one of "role" or "instanceProfile" for Karpenter to launch nodes role: "KarpenterNodeRole-${CLUSTER_NAME}" # Optional, IAM instance profile to use for the node identity. # Must specify one of "role" or "instanceProfile" for Karpenter to launch nodes instanceProfile: "KarpenterNodeInstanceProfile-${CLUSTER_NAME}" # Each term in the array of amiSelectorTerms is ORed together # Within a single term, all conditions are ANDed amiSelectorTerms: # Select on any AMI that has both the `karpenter.sh/discovery: ${CLUSTER_NAME}` # AND `environment: test` tags OR any AMI with the name `my-ami` OR an AMI with # ID `ami-123` - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" environment: test - name: my-ami - id: ami-123 # Select EKS optimized AL2023 AMIs with version `v20240703`. This term is mutually # exclusive and can't be specified with other terms. # - alias: al2023@v20240703 # Optional, propagates tags to underlying EC2 resources tags: team: team-a app: team-a-app # Optional, configures IMDS for the instance metadataOptions: httpEndpoint: enabled httpProtocolIPv6: disabled httpPutResponseHopLimit: 1 # This is changed to disable IMDS access from containers not on the host network httpTokens: required # Optional, configures storage devices for the instance blockDeviceMappings: - deviceName: /dev/xvda ebs: volumeSize: 100Gi volumeType: gp3 iops: 10000 encrypted: true kmsKeyID: "1234abcd-12ab-34cd-56ef-1234567890ab" deleteOnTermination: true throughput: 125 snapshotID: snap-0123456789 # Optional, use instance-store volumes for node ephemeral-storage instanceStorePolicy: RAID0 # Optional, overrides autogenerated userdata with a merge semantic userData: | echo "Hello world" # Optional, configures detailed monitoring for the instance detailedMonitoring: true # Optional, configures if the instance should be launched with an associated public IP address. # If not specified, the default value depends on the subnet's public IP auto-assign setting. associatePublicIPAddress: true

AWS Provider Node Pool Configuration

Below is an example configuration for a Node Pool targeting AWS:

# Template section that describes how to template out NodeClaim resources that Karpenter will provision # Karpenter will consider this template to be the minimum requirements needed to provision a Node using this NodePool # It will overlay this NodePool with Pods that need to schedule to further constrain the NodeClaims # Karpenter will provision to launch new Nodes for the cluster template: metadata: # Labels are arbitrary key-values that are applied to all nodes labels: billing-team: my-team # Annotations are arbitrary key-values that are applied to all nodes annotations: example.com/owner: "my-team" spec: # References the Cloud Provider's NodeClass resource, see your cloud provider specific documentation nodeClassRef: group: karpenter.k8s.aws # Updated since only a single version will be served kind: EC2NodeClass name: default # Provisioned nodes will have these taints # Taints may prevent pods from scheduling if they are not tolerated by the pod. taints: - key: example.com/special-taint effect: NoSchedule # Provisioned nodes will have these taints, but pods do not need to tolerate these taints to be provisioned by this # NodePool. These taints are expected to be temporary and some other entity (e.g. a DaemonSet) is responsible for # removing the taint after it has finished initializing the node. startupTaints: - key: example.com/another-taint effect: NoSchedule # The amount of time a Node can live on the cluster before being removed # Avoiding long-running Nodes helps to reduce security vulnerabilities as well as to reduce the chance of issues that can plague des with long uptimes such as file fragmentation or memory leaks from system processes # You can choose to disable expiration entirely by setting the string value 'Never' here # Note: changing this value in the nodepool will drift the nodeclaims. expireAfter: 720h | Never # The amount of time that a node can be draining before it's forcibly deleted. A node begins draining when a delete call is made ainst it, starting # its finalization flow. Pods with TerminationGracePeriodSeconds will be deleted preemptively before this terminationGracePeriod ds to give as much time to cleanup as possible. # If your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod # before it has its full terminationGracePeriod to cleanup. # Note: changing this value in the nodepool will drift the nodeclaims. terminationGracePeriod: 48h # Requirements that constrain the parameters of provisioned nodes. # These requirements are combined with pod.spec.topologySpreadConstraints, pod.spec.affinity.nodeAffinity, d.spec.affinity.podAffinity, and pod.spec.nodeSelector rules. # Operators { In, NotIn, Exists, DoesNotExist, Gt, and Lt } are supported. # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#operators requirements: - key: "karpenter.k8s.aws/instance-category" operator: In values: ["c", "m", "r"] # minValues here enforces the scheduler to consider at least that number of unique instance-category to schedule the pods. # This field is ALPHA and can be dropped or replaced at any time minValues: 2 - key: "karpenter.k8s.aws/instance-family" operator: In values: ["m5","m5d","c5","c5d","c4","r4"] minValues: 5 - key: "karpenter.k8s.aws/instance-cpu" operator: In values: ["4", "8", "16", "32"] - key: "karpenter.k8s.aws/instance-hypervisor" operator: In values: ["nitro"] - key: "karpenter.k8s.aws/instance-generation" operator: Gt values: ["2"] - key: "topology.kubernetes.io/zone" operator: In values: ["us-west-2a", "us-west-2b"] - key: "kubernetes.io/arch" operator: In values: ["arm64", "amd64"] - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"] # Disruption section which describes the ways in which Karpenter can disrupt and replace Nodes # Configuration in this section constrains how aggressive Karpenter can be with performing operations # like rolling Nodes due to them hitting their maximum lifetime (expiry) or scaling down nodes to reduce cluster cost disruption: # Describes which types of Nodes Karpenter should consider for consolidation # If using 'WhenEmptyOrUnderutilized', Karpenter will consider all nodes for consolidation and attempt to remove or replace Nodes en it discovers that the Node is empty or underutilized and could be changed to reduce cost # If using `WhenEmpty`, Karpenter will only consider nodes for consolidation that contain no workload pods consolidationPolicy: WhenEmptyOrUnderutilized | WhenEmpty # The amount of time Karpenter should wait to consolidate a node after a pod has been added or removed from the node. # You can choose to disable consolidation entirely by setting the string value 'Never' here consolidateAfter: 1m | Never # Added to allow additional control over consolidation aggressiveness # Budgets control the speed Karpenter can scale down nodes. # Karpenter will respect the minimum of the currently active budgets, and will round up # when considering percentages. Duration and Schedule must be set together. budgets: - nodes: 10% # On Weekdays during business hours, don't do any deprovisioning. - schedule: "0 9 * * mon-fri" duration: 8h nodes: "0" # Resource limits constrain the total size of the pool. # Limits prevent Karpenter from creating new instances once the limit is exceeded. limits: cpu: "1000" memory: 1000Gi # Priority given to the NodePool when the scheduler considers which NodePool # to select. Higher weights indicate higher priority when comparing NodePools. # Specifying no weight is equivalent to specifying a weight of 0. weight: 10
Last updated on