Installing with Terraform
This guide explains how to use the CloudPilot AI Terraform Provider to manage Amazon EKS clusters with automated cost optimization. The provider enables seamless integration of CloudPilot AI’s node autoscaling and workload autoscaling capabilities into your infrastructure-as-code workflow.
Overview
The CloudPilot AI Terraform Provider offers two core resources:
cloudpilotai_eks_cluster— Automate agent installation, workload rebalancing, and node autoscaling with custom NodeClass/NodePool configurations.cloudpilotai_workload_autoscaler— Right-size workload resource requests and limits through recommendation and autoscaling policies, with proactive optimization support.
Both resources also have corresponding data sources for read-only queries of cluster state and autoscaler configuration.
Prerequisites
Before using the Terraform Provider, ensure you have:
- Terraform — Version 1.0 or later
- AWS CLI — Configured with credentials that have EKS cluster management permissions
- Even if you supply a
kubeconfigpath, the provider still uses AWS CLI credentials for allkubectlcalls; make sure the CLI can authenticate to the target cluster account. - Multi-account setups: Switch to the profile that matches your EKS cluster (e.g.
AWS_PROFILE=prod-account) before running Terraform commands.
- Even if you supply a
- kubectl — For cluster operations and component management
- Helm — For deploying CloudPilot AI components to the cluster
- CloudPilot AI API Key — See Get API Keys for setup instructions
If you don’t have an EKS cluster yet, refer to the example cluster setup .
Provider Configuration
The provider supports two authentication methods — provide either api_key or api_key_profile:
terraform {
required_providers {
cloudpilotai = {
source = "cloudpilot-ai/cloudpilotai"
}
}
}
provider "cloudpilotai" {
api_key = var.cloudpilot_api_key
# Or authenticate via a file containing the key:
# api_key_profile = "/path/to/api-key-file"
# Optional. Defaults to https://api.cloudpilot.ai
# api_endpoint = "https://api.cloudpilot.ai"
}| Attribute | Type | Required | Description |
|---|---|---|---|
api_key | String (Sensitive) | No* | CloudPilot AI API key |
api_key_profile | String | No* | Path to a file containing the API key |
api_endpoint | String | No | API endpoint. Defaults to https://api.cloudpilot.ai |
* One of api_key or api_key_profile must be provided.
Quick Start
-
Configure AWS credentials:
aws configure aws sts get-caller-identity -
Create
main.tf:terraform { required_providers { cloudpilotai = { source = "cloudpilot-ai/cloudpilotai" } } } provider "cloudpilotai" { api_key = "sk-xxx" } resource "cloudpilotai_eks_cluster" "example" { cluster_name = "my-eks-cluster" region = "us-west-2" restore_node_number = 2 } -
Initialize and apply:
terraform init terraform plan terraform apply
Example Configurations
The following examples are available in the GitHub repository :
| Example | Description |
|---|---|
0_minimal | Node Autoscaler + Workload Autoscaler with minimal configuration |
1_details | Full-featured configuration with all options, templates, and data sources |
2_read-only_access | Agent-only installation for monitoring without optimization changes |
3_basic_rebalance | Basic cost optimization with workload rebalancing |
4_nodeclass_nodepool_rebalance | Custom NodeClass and NodePool with instance filtering and disruption controls |
Resource: cloudpilotai_eks_cluster
Manages an EKS cluster’s integration with CloudPilot AI, including agent installation, workload rebalancing, and node autoscaling.
Basic Usage
resource "cloudpilotai_eks_cluster" "example" {
cluster_name = "my-eks-cluster"
region = "us-west-2"
restore_node_number = 2
enable_rebalance = true
}Required Attributes
| Attribute | Type | Description |
|---|---|---|
cluster_name | String | Name of the EKS cluster |
region | String | AWS region where the cluster runs |
restore_node_number | Number | Number of nodes to restore when the resource is destroyed. Set to 0 to skip restore. |
Optional Attributes
| Attribute | Type | Default | Description |
|---|---|---|---|
kubeconfig | String | Auto-generated | Path to kubeconfig file. If omitted, the provider generates one via AWS CLI. |
only_install_agent | Boolean | false | Only install the monitoring agent without optimization components |
enable_rebalance | Boolean | false | Enable workload rebalancing to optimize node utilization |
disable_workload_uploading | Boolean | false | Disable uploading workload information to CloudPilot AI |
enable_upgrade_agent | Boolean | false | Upgrade the CloudPilot AI agent on next apply |
enable_upgrade_rebalance_component | Boolean | false | Upgrade the rebalance component on next apply |
enable_upload_config | Boolean | true | Upload NodePool/NodeClass configuration to CloudPilot AI |
enable_diversity_instance_type | Boolean | false | Use diverse instance types for improved fault tolerance |
workload_templates | List | [] | Reusable workload templates (see below) |
workloads | List | [] | Workload rebalance configurations |
nodeclass_templates | List | [] | Reusable NodeClass templates |
nodeclasses | List | [] | NodeClass configurations |
nodepool_templates | List | [] | Reusable NodePool templates |
nodepools | List | [] | NodePool configurations |
Read-Only Attributes
| Attribute | Type | Description |
|---|---|---|
cluster_id | String | CloudPilot AI cluster ID |
account_id | String | AWS account ID |
Workload Configuration
Workloads control which deployments participate in rebalancing and spot instance optimization. You can define reusable templates and then reference them in individual workloads:
resource "cloudpilotai_eks_cluster" "example" {
# ... required attributes ...
workload_templates = [
{
template_name = "spot-friendly"
rebalance_able = true
spot_friendly = true
min_non_spot_replicas = 1
}
]
workloads = [
{
name = "my-app"
type = "deployment"
namespace = "default"
template_name = "spot-friendly"
}
]
}| Attribute | Type | Default | Description |
|---|---|---|---|
name | String | — | Workload name (required for workloads) |
type | String | — | Workload type, e.g. deployment (required for workloads) |
namespace | String | — | Kubernetes namespace (required for workloads) |
template_name | String | — | Name of the template to inherit settings from |
rebalance_able | Boolean | true | Whether the workload can be rebalanced |
spot_friendly | Boolean | true | Whether the workload is suitable for spot instances |
min_non_spot_replicas | Number | 0 | Minimum number of replicas that must run on on-demand instances |
NodeClass Configuration
NodeClasses define the instance-level properties for provisioned nodes. The system default NodeClass name is cloudpilot.
resource "cloudpilotai_eks_cluster" "example" {
# ... required attributes ...
nodeclasses = [
{
name = "cloudpilot"
instance_tags = { "cloudpilot.ai/managed" = "true" }
system_disk_size_gib = 20
extra_cpu_allocation_mcore = 0
extra_memory_allocation_mib = 0
}
]
}| Attribute | Type | Default | Description |
|---|---|---|---|
name | String | — | NodeClass name (required for nodeclasses) |
template_name | String | — | Template name (required for nodeclass_templates) |
origin_nodeclass_json | String | — | Raw NodeClass JSON; overrides all other settings if set |
instance_tags | Map(String) | {"cloudpilot.ai/managed" = "true"} | Tags applied to each provisioned node |
system_disk_size_gib | Number | 20 | System disk size in GiB |
extra_cpu_allocation_mcore | Number | 0 | Extra CPU allocation in millicores for burstable pods |
extra_memory_allocation_mib | Number | 0 | Extra memory allocation in MiB for burstable pods |
NodePool Configuration
NodePools control how nodes are provisioned, including instance types, capacity types, and disruption policies. The system default NodePool name is cloudpilot-general.
resource "cloudpilotai_eks_cluster" "example" {
# ... required attributes ...
nodepools = [
{
name = "cloudpilot-general"
nodeclass = "cloudpilot"
enable = true
provision_priority = 2
instance_arch = ["amd64"]
capacity_type = ["spot", "on-demand"]
instance_cpu_max = 17
instance_memory_max = 32769
node_disruption_limit = "2"
node_disruption_delay = "60m"
}
]
}| Attribute | Type | Default | Description |
|---|---|---|---|
name | String | — | NodePool name (required for nodepools) |
template_name | String | — | Template name (required for nodepool_templates) |
origin_nodepool_json | String | — | Raw NodePool JSON; overrides all other settings if set |
enable | Boolean | true | Whether the NodePool is active |
nodeclass | String | — | Associated NodeClass name |
enable_gpu | Boolean | false | Enable GPU instances |
provision_priority | Number | 1 | Scheduling priority (higher = higher priority) |
instance_family | List(String) | all | Instance families to use (e.g. ["t3", "m5"]) |
instance_arch | List(String) | ["amd64", "arm64"] | CPU architectures |
capacity_type | List(String) | ["on-demand", "spot"] | Capacity types |
zone | List(String) | all | Availability zones |
instance_cpu_min | Number | 0 | Minimum CPU cores per node (0 = no limit) |
instance_cpu_max | Number | 17 | Maximum CPU cores per node (0 = no limit) |
instance_memory_min | Number | 0 | Minimum memory in MiB per node (0 = no limit) |
instance_memory_max | Number | 32769 | Maximum memory in MiB per node (0 = no limit) |
node_disruption_limit | String | "2" | Max nodes that can be terminated at once (number or percentage) |
node_disruption_delay | String | "60m" | Wait time before terminating underutilized nodes |
Resource: cloudpilotai_workload_autoscaler
Manages workload resource optimization through recommendation policies, autoscaling policies, and proactive optimization. This resource is cloud-provider independent and works with any Kubernetes cluster managed by CloudPilot AI.
Basic Usage
resource "cloudpilotai_workload_autoscaler" "example" {
cluster_id = cloudpilotai_eks_cluster.example.cluster_id
kubeconfig = cloudpilotai_eks_cluster.example.kubeconfig
recommendation_policies = []
autoscaling_policies = []
enable_proactive = [
{
namespaces = ["default"]
}
]
}Required Attributes
| Attribute | Type | Description |
|---|---|---|
cluster_id | String | CloudPilot AI cluster ID (can reference cloudpilotai_eks_cluster.*.cluster_id) |
kubeconfig | String | Path to kubeconfig file for kubectl/helm operations |
Optional Attributes
| Attribute | Type | Default | Description |
|---|---|---|---|
storage_class | String | cluster default | StorageClass for VictoriaMetrics persistent volume |
enable_node_agent | Boolean | true | Deploy the Node Agent DaemonSet for per-node metrics |
recommendation_policies | List | [] | Recommendation policies for resource right-sizing |
autoscaling_policies | List | [] | Autoscaling policies for automated resource updates |
enable_proactive | List | [] | Filters to enable proactive optimization on matching workloads |
disable_proactive | List | [] | Filters to disable proactive optimization on matching workloads |
Recommendation Policies
Recommendation policies define how CloudPilot AI analyzes historical metrics to produce right-sizing recommendations.
recommendation_policies = [
{
name = "balanced"
strategy_type = "percentile"
percentile_cpu = 95
percentile_memory = 99
history_window_cpu = "24h"
history_window_memory = "48h"
evaluation_period = "1m"
buffer_cpu = "10%"
buffer_memory = "20%"
request_min_cpu = "25%"
request_min_memory = "30%"
}
]| Attribute | Type | Default | Description |
|---|---|---|---|
name | String | — | Policy name (required) |
strategy_type | String | "percentile" | Analysis strategy |
percentile_cpu | Number | 95 | CPU usage percentile (50–100) |
percentile_memory | Number | 95 | Memory usage percentile (50–100) |
history_window_cpu | String | — | CPU metrics lookback window (required, e.g. "24h") |
history_window_memory | String | — | Memory metrics lookback window (required, e.g. "48h") |
evaluation_period | String | — | Evaluation interval (required, e.g. "1m") |
buffer_cpu | String | "" | CPU safety buffer (percentage like "10%" or absolute like "100m") |
buffer_memory | String | "" | Memory safety buffer |
request_min_cpu | String | "" | Minimum CPU request floor |
request_min_memory | String | "" | Minimum memory request floor |
request_max_cpu | String | "" | Maximum CPU request ceiling |
request_max_memory | String | "" | Maximum memory request ceiling |
Autoscaling Policies
Autoscaling policies control how recommendations are applied to workloads, including targeting, scheduling, and update behavior.
autoscaling_policies = [
{
name = "production"
enable = true
recommendation_policy_name = "balanced"
priority = 10
update_resources = ["cpu", "memory"]
drift_threshold_cpu = "5%"
drift_threshold_memory = "5%"
on_policy_removal = "recreate"
target_refs = [
{
api_version = "apps/v1"
kind = "Deployment"
name = ""
namespace = "production"
}
]
update_schedules = [
{
name = "default"
schedule = ""
duration = ""
mode = "recreate"
}
]
limit_policies = [
{
resource = "cpu"
remove_limit = true
},
{
resource = "memory"
auto_headroom = "2"
}
]
}
]Core Attributes
| Attribute | Type | Default | Description |
|---|---|---|---|
name | String | — | Policy name (required) |
enable | Boolean | true | Whether the policy is active |
recommendation_policy_name | String | — | Name of the recommendation policy to use (required) |
priority | Number | 0 | Policy priority (higher takes precedence for overlapping targets) |
update_resources | List(String) | [] | Resources to optimize (e.g. ["cpu", "memory"]) |
drift_threshold_cpu | String | "" | Minimum CPU drift before updating |
drift_threshold_memory | String | "" | Minimum memory drift before updating |
on_policy_removal | String | "off" | Behavior when policy is removed: off, recreate, or inplace |
Target Refs
| Attribute | Type | Description |
|---|---|---|
api_version | String | Kubernetes API version (e.g. "apps/v1") (required) |
kind | String | Workload kind: Deployment or StatefulSet (required) |
name | String | Workload name. Empty string matches all. |
namespace | String | Namespace. Empty string matches all. |
Update Schedules
| Attribute | Type | Description |
|---|---|---|
name | String | Schedule name (required) |
schedule | String | Cron expression for the update window |
duration | String | Window duration |
mode | String | Update mode: oncreate, recreate, inplace, or off (required) |
Limit Policies
| Attribute | Type | Default | Description |
|---|---|---|---|
resource | String | — | Resource name: cpu or memory (required) |
remove_limit | Boolean | false | Remove the resource limit entirely |
keep_limit | Boolean | false | Keep the original resource limit unchanged |
multiplier | String | "" | Set limit as a multiplier of the request |
auto_headroom | String | "" | Automatically calculate headroom multiplier |
Startup Boost
| Attribute | Type | Default | Description |
|---|---|---|---|
startup_boost_enabled | Boolean | false | Temporarily increase resources during container startup |
startup_boost_min_boost_duration | String | "" | Minimum boost duration |
startup_boost_min_ready_duration | String | "" | Minimum ready duration |
startup_boost_multiplier_cpu | String | "" | CPU multiplier during startup |
startup_boost_multiplier_memory | String | "" | Memory multiplier during startup |
in_place_fallback_default_policy | String | "" | Fallback when in-place update fails: recreate or hold |
Proactive Optimization Filters
Use enable_proactive and disable_proactive to control which workloads receive proactive resource updates based on namespace, kind, or other criteria:
enable_proactive = [
{
namespaces = ["production", "staging"]
}
]
disable_proactive = [
{
namespaces = ["kube-system"]
}
]| Attribute | Type | Description |
|---|---|---|
workload_name | String | Filter by workload name (substring match) |
namespaces | List(String) | Filter by namespaces |
workload_kinds | List(String) | Filter by workload kinds |
autoscaling_policy_names | List(String) | Filter by autoscaling policy names |
recommendation_policy_names | List(String) | Filter by recommendation policy names |
workload_state | String | Filter by workload state |
optimization_states | List(String) | Filter by optimization states |
runtime_languages | List(String) | Filter by runtime languages |
optimized | Boolean | Filter by optimization status |
disable_proactive_update | Boolean | Whether to disable proactive update |
Data Sources
cloudpilotai_eks_cluster
Read-only query of a registered EKS cluster’s state.
data "cloudpilotai_eks_cluster" "example" {
cluster_name = "my-eks-cluster"
region = "us-west-2"
}
output "cluster_status" {
value = data.cloudpilotai_eks_cluster.example.status
}| Attribute | Type | Required | Description |
|---|---|---|---|
cluster_name | String | Yes | EKS cluster name |
region | String | Yes | AWS region |
account_id | String | No | AWS account ID (inferred from AWS CLI if omitted) |
Computed attributes: cluster_id, cloud_provider, status (online/offline/demo), agent_version, rebalance_enable.
cloudpilotai_workload_autoscaler
Read-only query of the Workload Autoscaler configuration.
data "cloudpilotai_workload_autoscaler" "example" {
cluster_id = cloudpilotai_eks_cluster.example.cluster_id
}
output "wa_installed" {
value = data.cloudpilotai_workload_autoscaler.example.installed
}| Attribute | Type | Required | Description |
|---|---|---|---|
cluster_id | String | Yes | CloudPilot AI cluster ID |
Computed attributes: enabled, installed.
Complete Example
The following example demonstrates a production setup with both Node Autoscaler and Workload Autoscaler:
terraform {
required_providers {
cloudpilotai = {
source = "cloudpilot-ai/cloudpilotai"
}
}
}
provider "cloudpilotai" {
api_key = var.cloudpilot_api_key
}
resource "cloudpilotai_eks_cluster" "production" {
cluster_name = "production-cluster"
region = "us-west-2"
restore_node_number = 3
enable_rebalance = true
nodeclasses = [
{
name = "cloudpilot"
system_disk_size_gib = 30
instance_tags = { "cloudpilot.ai/managed" = "true" }
}
]
nodepools = [
{
name = "cloudpilot-general"
nodeclass = "cloudpilot"
instance_arch = ["amd64"]
capacity_type = ["spot", "on-demand"]
instance_cpu_max = 17
instance_memory_max = 32769
node_disruption_limit = "2"
node_disruption_delay = "60m"
}
]
}
resource "cloudpilotai_workload_autoscaler" "production" {
cluster_id = cloudpilotai_eks_cluster.production.cluster_id
kubeconfig = cloudpilotai_eks_cluster.production.kubeconfig
recommendation_policies = [
{
name = "balanced"
percentile_cpu = 95
percentile_memory = 99
history_window_cpu = "24h"
history_window_memory = "48h"
evaluation_period = "1m"
buffer_cpu = "10%"
buffer_memory = "20%"
}
]
autoscaling_policies = [
{
name = "auto-optimize"
recommendation_policy_name = "balanced"
priority = 10
update_resources = ["cpu", "memory"]
drift_threshold_cpu = "5%"
drift_threshold_memory = "5%"
on_policy_removal = "recreate"
target_refs = [
{
api_version = "apps/v1"
kind = "Deployment"
namespace = "production"
}
]
update_schedules = [
{
name = "default"
mode = "recreate"
}
]
}
]
enable_proactive = [
{
namespaces = ["production"]
}
]
disable_proactive = [
{
namespaces = ["kube-system"]
}
]
}Additional Resources
- Terraform Provider Registry — Full provider documentation
- GitHub Repository — Source code and examples
- GitHub Issues — Report issues or request features