Skip to Content
GuideGetting StartedInstalling with Terraform

Installing with Terraform

This guide explains how to use the CloudPilot AI Terraform Provider to manage Amazon EKS clusters with automated cost optimization. The provider enables seamless integration of CloudPilot AI’s node autoscaling and workload autoscaling capabilities into your infrastructure-as-code workflow.

Overview

The CloudPilot AI Terraform Provider offers two core resources:

  • cloudpilotai_eks_cluster — Automate agent installation, workload rebalancing, and node autoscaling with custom NodeClass/NodePool configurations.
  • cloudpilotai_workload_autoscaler — Right-size workload resource requests and limits through recommendation and autoscaling policies, with proactive optimization support.

Both resources also have corresponding data sources for read-only queries of cluster state and autoscaler configuration.

Prerequisites

Before using the Terraform Provider, ensure you have:

  • Terraform  — Version 1.0 or later
  • AWS CLI  — Configured with credentials that have EKS cluster management permissions
    • Even if you supply a kubeconfig path, the provider still uses AWS CLI credentials for all kubectl calls; make sure the CLI can authenticate to the target cluster account.
    • Multi-account setups: Switch to the profile that matches your EKS cluster (e.g. AWS_PROFILE=prod-account) before running Terraform commands.
  • kubectl  — For cluster operations and component management
  • Helm  — For deploying CloudPilot AI components to the cluster
  • CloudPilot AI API Key — See Get API Keys for setup instructions

If you don’t have an EKS cluster yet, refer to the example cluster setup .

Provider Configuration

The provider supports two authentication methods — provide either api_key or api_key_profile:

terraform { required_providers { cloudpilotai = { source = "cloudpilot-ai/cloudpilotai" } } } provider "cloudpilotai" { api_key = var.cloudpilot_api_key # Or authenticate via a file containing the key: # api_key_profile = "/path/to/api-key-file" # Optional. Defaults to https://api.cloudpilot.ai # api_endpoint = "https://api.cloudpilot.ai" }
AttributeTypeRequiredDescription
api_keyString (Sensitive)No*CloudPilot AI API key
api_key_profileStringNo*Path to a file containing the API key
api_endpointStringNoAPI endpoint. Defaults to https://api.cloudpilot.ai

* One of api_key or api_key_profile must be provided.

Quick Start

  1. Configure AWS credentials:

    aws configure aws sts get-caller-identity
  2. Create main.tf:

    terraform { required_providers { cloudpilotai = { source = "cloudpilot-ai/cloudpilotai" } } } provider "cloudpilotai" { api_key = "sk-xxx" } resource "cloudpilotai_eks_cluster" "example" { cluster_name = "my-eks-cluster" region = "us-west-2" restore_node_number = 2 }
  3. Initialize and apply:

    terraform init terraform plan terraform apply

Example Configurations

The following examples are available in the GitHub repository :

ExampleDescription
0_minimalNode Autoscaler + Workload Autoscaler with minimal configuration
1_detailsFull-featured configuration with all options, templates, and data sources
2_read-only_accessAgent-only installation for monitoring without optimization changes
3_basic_rebalanceBasic cost optimization with workload rebalancing
4_nodeclass_nodepool_rebalanceCustom NodeClass and NodePool with instance filtering and disruption controls

Resource: cloudpilotai_eks_cluster

Manages an EKS cluster’s integration with CloudPilot AI, including agent installation, workload rebalancing, and node autoscaling.

Basic Usage

resource "cloudpilotai_eks_cluster" "example" { cluster_name = "my-eks-cluster" region = "us-west-2" restore_node_number = 2 enable_rebalance = true }

Required Attributes

AttributeTypeDescription
cluster_nameStringName of the EKS cluster
regionStringAWS region where the cluster runs
restore_node_numberNumberNumber of nodes to restore when the resource is destroyed. Set to 0 to skip restore.

Optional Attributes

AttributeTypeDefaultDescription
kubeconfigStringAuto-generatedPath to kubeconfig file. If omitted, the provider generates one via AWS CLI.
only_install_agentBooleanfalseOnly install the monitoring agent without optimization components
enable_rebalanceBooleanfalseEnable workload rebalancing to optimize node utilization
disable_workload_uploadingBooleanfalseDisable uploading workload information to CloudPilot AI
enable_upgrade_agentBooleanfalseUpgrade the CloudPilot AI agent on next apply
enable_upgrade_rebalance_componentBooleanfalseUpgrade the rebalance component on next apply
enable_upload_configBooleantrueUpload NodePool/NodeClass configuration to CloudPilot AI
enable_diversity_instance_typeBooleanfalseUse diverse instance types for improved fault tolerance
workload_templatesList[]Reusable workload templates (see below)
workloadsList[]Workload rebalance configurations
nodeclass_templatesList[]Reusable NodeClass templates
nodeclassesList[]NodeClass configurations
nodepool_templatesList[]Reusable NodePool templates
nodepoolsList[]NodePool configurations

Read-Only Attributes

AttributeTypeDescription
cluster_idStringCloudPilot AI cluster ID
account_idStringAWS account ID

Workload Configuration

Workloads control which deployments participate in rebalancing and spot instance optimization. You can define reusable templates and then reference them in individual workloads:

resource "cloudpilotai_eks_cluster" "example" { # ... required attributes ... workload_templates = [ { template_name = "spot-friendly" rebalance_able = true spot_friendly = true min_non_spot_replicas = 1 } ] workloads = [ { name = "my-app" type = "deployment" namespace = "default" template_name = "spot-friendly" } ] }
AttributeTypeDefaultDescription
nameStringWorkload name (required for workloads)
typeStringWorkload type, e.g. deployment (required for workloads)
namespaceStringKubernetes namespace (required for workloads)
template_nameStringName of the template to inherit settings from
rebalance_ableBooleantrueWhether the workload can be rebalanced
spot_friendlyBooleantrueWhether the workload is suitable for spot instances
min_non_spot_replicasNumber0Minimum number of replicas that must run on on-demand instances

NodeClass Configuration

NodeClasses define the instance-level properties for provisioned nodes. The system default NodeClass name is cloudpilot.

resource "cloudpilotai_eks_cluster" "example" { # ... required attributes ... nodeclasses = [ { name = "cloudpilot" instance_tags = { "cloudpilot.ai/managed" = "true" } system_disk_size_gib = 20 extra_cpu_allocation_mcore = 0 extra_memory_allocation_mib = 0 } ] }
AttributeTypeDefaultDescription
nameStringNodeClass name (required for nodeclasses)
template_nameStringTemplate name (required for nodeclass_templates)
origin_nodeclass_jsonStringRaw NodeClass JSON; overrides all other settings if set
instance_tagsMap(String){"cloudpilot.ai/managed" = "true"}Tags applied to each provisioned node
system_disk_size_gibNumber20System disk size in GiB
extra_cpu_allocation_mcoreNumber0Extra CPU allocation in millicores for burstable pods
extra_memory_allocation_mibNumber0Extra memory allocation in MiB for burstable pods

NodePool Configuration

NodePools control how nodes are provisioned, including instance types, capacity types, and disruption policies. The system default NodePool name is cloudpilot-general.

resource "cloudpilotai_eks_cluster" "example" { # ... required attributes ... nodepools = [ { name = "cloudpilot-general" nodeclass = "cloudpilot" enable = true provision_priority = 2 instance_arch = ["amd64"] capacity_type = ["spot", "on-demand"] instance_cpu_max = 17 instance_memory_max = 32769 node_disruption_limit = "2" node_disruption_delay = "60m" } ] }
AttributeTypeDefaultDescription
nameStringNodePool name (required for nodepools)
template_nameStringTemplate name (required for nodepool_templates)
origin_nodepool_jsonStringRaw NodePool JSON; overrides all other settings if set
enableBooleantrueWhether the NodePool is active
nodeclassStringAssociated NodeClass name
enable_gpuBooleanfalseEnable GPU instances
provision_priorityNumber1Scheduling priority (higher = higher priority)
instance_familyList(String)allInstance families to use (e.g. ["t3", "m5"])
instance_archList(String)["amd64", "arm64"]CPU architectures
capacity_typeList(String)["on-demand", "spot"]Capacity types
zoneList(String)allAvailability zones
instance_cpu_minNumber0Minimum CPU cores per node (0 = no limit)
instance_cpu_maxNumber17Maximum CPU cores per node (0 = no limit)
instance_memory_minNumber0Minimum memory in MiB per node (0 = no limit)
instance_memory_maxNumber32769Maximum memory in MiB per node (0 = no limit)
node_disruption_limitString"2"Max nodes that can be terminated at once (number or percentage)
node_disruption_delayString"60m"Wait time before terminating underutilized nodes

Resource: cloudpilotai_workload_autoscaler

Manages workload resource optimization through recommendation policies, autoscaling policies, and proactive optimization. This resource is cloud-provider independent and works with any Kubernetes cluster managed by CloudPilot AI.

Basic Usage

resource "cloudpilotai_workload_autoscaler" "example" { cluster_id = cloudpilotai_eks_cluster.example.cluster_id kubeconfig = cloudpilotai_eks_cluster.example.kubeconfig recommendation_policies = [] autoscaling_policies = [] enable_proactive = [ { namespaces = ["default"] } ] }

Required Attributes

AttributeTypeDescription
cluster_idStringCloudPilot AI cluster ID (can reference cloudpilotai_eks_cluster.*.cluster_id)
kubeconfigStringPath to kubeconfig file for kubectl/helm operations

Optional Attributes

AttributeTypeDefaultDescription
storage_classStringcluster defaultStorageClass for VictoriaMetrics persistent volume
enable_node_agentBooleantrueDeploy the Node Agent DaemonSet for per-node metrics
recommendation_policiesList[]Recommendation policies for resource right-sizing
autoscaling_policiesList[]Autoscaling policies for automated resource updates
enable_proactiveList[]Filters to enable proactive optimization on matching workloads
disable_proactiveList[]Filters to disable proactive optimization on matching workloads

Recommendation Policies

Recommendation policies define how CloudPilot AI analyzes historical metrics to produce right-sizing recommendations.

recommendation_policies = [ { name = "balanced" strategy_type = "percentile" percentile_cpu = 95 percentile_memory = 99 history_window_cpu = "24h" history_window_memory = "48h" evaluation_period = "1m" buffer_cpu = "10%" buffer_memory = "20%" request_min_cpu = "25%" request_min_memory = "30%" } ]
AttributeTypeDefaultDescription
nameStringPolicy name (required)
strategy_typeString"percentile"Analysis strategy
percentile_cpuNumber95CPU usage percentile (50–100)
percentile_memoryNumber95Memory usage percentile (50–100)
history_window_cpuStringCPU metrics lookback window (required, e.g. "24h")
history_window_memoryStringMemory metrics lookback window (required, e.g. "48h")
evaluation_periodStringEvaluation interval (required, e.g. "1m")
buffer_cpuString""CPU safety buffer (percentage like "10%" or absolute like "100m")
buffer_memoryString""Memory safety buffer
request_min_cpuString""Minimum CPU request floor
request_min_memoryString""Minimum memory request floor
request_max_cpuString""Maximum CPU request ceiling
request_max_memoryString""Maximum memory request ceiling

Autoscaling Policies

Autoscaling policies control how recommendations are applied to workloads, including targeting, scheduling, and update behavior.

autoscaling_policies = [ { name = "production" enable = true recommendation_policy_name = "balanced" priority = 10 update_resources = ["cpu", "memory"] drift_threshold_cpu = "5%" drift_threshold_memory = "5%" on_policy_removal = "recreate" target_refs = [ { api_version = "apps/v1" kind = "Deployment" name = "" namespace = "production" } ] update_schedules = [ { name = "default" schedule = "" duration = "" mode = "recreate" } ] limit_policies = [ { resource = "cpu" remove_limit = true }, { resource = "memory" auto_headroom = "2" } ] } ]

Core Attributes

AttributeTypeDefaultDescription
nameStringPolicy name (required)
enableBooleantrueWhether the policy is active
recommendation_policy_nameStringName of the recommendation policy to use (required)
priorityNumber0Policy priority (higher takes precedence for overlapping targets)
update_resourcesList(String)[]Resources to optimize (e.g. ["cpu", "memory"])
drift_threshold_cpuString""Minimum CPU drift before updating
drift_threshold_memoryString""Minimum memory drift before updating
on_policy_removalString"off"Behavior when policy is removed: off, recreate, or inplace

Target Refs

AttributeTypeDescription
api_versionStringKubernetes API version (e.g. "apps/v1") (required)
kindStringWorkload kind: Deployment or StatefulSet (required)
nameStringWorkload name. Empty string matches all.
namespaceStringNamespace. Empty string matches all.

Update Schedules

AttributeTypeDescription
nameStringSchedule name (required)
scheduleStringCron expression for the update window
durationStringWindow duration
modeStringUpdate mode: oncreate, recreate, inplace, or off (required)

Limit Policies

AttributeTypeDefaultDescription
resourceStringResource name: cpu or memory (required)
remove_limitBooleanfalseRemove the resource limit entirely
keep_limitBooleanfalseKeep the original resource limit unchanged
multiplierString""Set limit as a multiplier of the request
auto_headroomString""Automatically calculate headroom multiplier

Startup Boost

AttributeTypeDefaultDescription
startup_boost_enabledBooleanfalseTemporarily increase resources during container startup
startup_boost_min_boost_durationString""Minimum boost duration
startup_boost_min_ready_durationString""Minimum ready duration
startup_boost_multiplier_cpuString""CPU multiplier during startup
startup_boost_multiplier_memoryString""Memory multiplier during startup
in_place_fallback_default_policyString""Fallback when in-place update fails: recreate or hold

Proactive Optimization Filters

Use enable_proactive and disable_proactive to control which workloads receive proactive resource updates based on namespace, kind, or other criteria:

enable_proactive = [ { namespaces = ["production", "staging"] } ] disable_proactive = [ { namespaces = ["kube-system"] } ]
AttributeTypeDescription
workload_nameStringFilter by workload name (substring match)
namespacesList(String)Filter by namespaces
workload_kindsList(String)Filter by workload kinds
autoscaling_policy_namesList(String)Filter by autoscaling policy names
recommendation_policy_namesList(String)Filter by recommendation policy names
workload_stateStringFilter by workload state
optimization_statesList(String)Filter by optimization states
runtime_languagesList(String)Filter by runtime languages
optimizedBooleanFilter by optimization status
disable_proactive_updateBooleanWhether to disable proactive update

Data Sources

cloudpilotai_eks_cluster

Read-only query of a registered EKS cluster’s state.

data "cloudpilotai_eks_cluster" "example" { cluster_name = "my-eks-cluster" region = "us-west-2" } output "cluster_status" { value = data.cloudpilotai_eks_cluster.example.status }
AttributeTypeRequiredDescription
cluster_nameStringYesEKS cluster name
regionStringYesAWS region
account_idStringNoAWS account ID (inferred from AWS CLI if omitted)

Computed attributes: cluster_id, cloud_provider, status (online/offline/demo), agent_version, rebalance_enable.

cloudpilotai_workload_autoscaler

Read-only query of the Workload Autoscaler configuration.

data "cloudpilotai_workload_autoscaler" "example" { cluster_id = cloudpilotai_eks_cluster.example.cluster_id } output "wa_installed" { value = data.cloudpilotai_workload_autoscaler.example.installed }
AttributeTypeRequiredDescription
cluster_idStringYesCloudPilot AI cluster ID

Computed attributes: enabled, installed.

Complete Example

The following example demonstrates a production setup with both Node Autoscaler and Workload Autoscaler:

terraform { required_providers { cloudpilotai = { source = "cloudpilot-ai/cloudpilotai" } } } provider "cloudpilotai" { api_key = var.cloudpilot_api_key } resource "cloudpilotai_eks_cluster" "production" { cluster_name = "production-cluster" region = "us-west-2" restore_node_number = 3 enable_rebalance = true nodeclasses = [ { name = "cloudpilot" system_disk_size_gib = 30 instance_tags = { "cloudpilot.ai/managed" = "true" } } ] nodepools = [ { name = "cloudpilot-general" nodeclass = "cloudpilot" instance_arch = ["amd64"] capacity_type = ["spot", "on-demand"] instance_cpu_max = 17 instance_memory_max = 32769 node_disruption_limit = "2" node_disruption_delay = "60m" } ] } resource "cloudpilotai_workload_autoscaler" "production" { cluster_id = cloudpilotai_eks_cluster.production.cluster_id kubeconfig = cloudpilotai_eks_cluster.production.kubeconfig recommendation_policies = [ { name = "balanced" percentile_cpu = 95 percentile_memory = 99 history_window_cpu = "24h" history_window_memory = "48h" evaluation_period = "1m" buffer_cpu = "10%" buffer_memory = "20%" } ] autoscaling_policies = [ { name = "auto-optimize" recommendation_policy_name = "balanced" priority = 10 update_resources = ["cpu", "memory"] drift_threshold_cpu = "5%" drift_threshold_memory = "5%" on_policy_removal = "recreate" target_refs = [ { api_version = "apps/v1" kind = "Deployment" namespace = "production" } ] update_schedules = [ { name = "default" mode = "recreate" } ] } ] enable_proactive = [ { namespaces = ["production"] } ] disable_proactive = [ { namespaces = ["kube-system"] } ] }

Additional Resources

Last updated on