Custom IAM Roles for EKS
This guide explains how to configure custom IAM roles for CloudPilot on EKS, and how to use those roles during installation, migration, and upgrade.
This guide applies when you install CloudPilot on EKS with CUSTOM_NODE_ROLE and/or CUSTOM_CONTROLLER_ROLE.
Behavior When Custom Roles Are Provided
When a custom IAM role is provided:
- The installer no longer updates the custom role trust policy.
- The installer no longer attaches, detaches, or rewrites policies on the custom role.
- The installer validates that the custom role already satisfies the minimum CloudPilot requirements and fails fast if it does not.
- For a custom node role, the installer does not require the role to inherit permissions from the EKS managed node group role. It only validates the minimum permissions listed below.
The installer still manages non-role AWS resources such as cluster access entries, subnet/security-group tags, and the EKS OIDC provider when UPDATE_AWS_RESOURCE=true.
The IAM identity that runs the installer must also be allowed to call iam:SimulatePrincipalPolicy on the custom roles. Without that permission, the installer cannot complete the validation step.
Prepare the Role Files
Whether you apply the roles with AWS CLI or paste the policies in AWS Console, start by exporting the same environment variables and generating the same JSON files.
The examples below use envsubst to render shell variables into JSON files. If envsubst is not available on your machine, install GNU gettext first.
1. Export the required environment variables
Change the cluster and role names below, and usually leave AWS_PARTITION as-is:
export AWS_PARTITION=${AWS_PARTITION:-aws}
export CLUSTER_NAME="<your-cluster-name>"
export CLUSTER_REGION="<your-cluster-region>"
export NODE_ROLE_NAME="<your-node-role-name>"
export CONTROLLER_ROLE_NAME="<your-controller-role-name>"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
export OIDC_PROVIDER_HOSTPATH=$(
aws eks describe-cluster \
--name "$CLUSTER_NAME" \
--region "$CLUSTER_REGION" \
--query 'cluster.identity.oidc.issuer' \
--output text | sed 's#^https://##'
)
export NODE_ROLE_ARN="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/${NODE_ROLE_NAME}"2. Generate the JSON files
Use > instead of >> so that rerunning the command overwrites the old file instead of appending a second JSON document.
cat <<'EOF' | envsubst > node-role-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
cat <<'EOF' | envsubst > controller-role-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER_HOSTPATH}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER_HOSTPATH}:aud": "sts.amazonaws.com",
"${OIDC_PROVIDER_HOSTPATH}:sub": "system:serviceaccount:cloudpilot:cloudpilot-admin"
}
}
}
]
}
EOF
cat <<'EOF' | envsubst > controller-role-minimum-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudPilotReadAutoscalingAndNodeGroup",
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": "*"
},
{
"Sid": "CloudPilotMutateAutoscaling",
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup"
],
"Resource": "*"
},
{
"Sid": "CloudPilotProvisionEC2Capacity",
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:DescribeSpotPriceHistory",
"pricing:GetProducts",
"savingsplans:DescribeSavingsPlans",
"ec2:DescribeRegions"
],
"Resource": "*"
},
{
"Sid": "CloudPilotTerminateClusterInstancesOnly",
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Resource": "*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME}": [
"owned",
"shared"
]
}
}
},
{
"Sid": "CloudPilotPassNodeRole",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "${NODE_ROLE_ARN}"
},
{
"Sid": "CloudPilotDescribeCluster",
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:${AWS_PARTITION}:eks:${CLUSTER_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}"
},
{
"Sid": "CloudPilotCreateScopedInstanceProfiles",
"Effect": "Allow",
"Action": [
"iam:CreateInstanceProfile"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${CLUSTER_REGION}"
},
"StringLike": {
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "CloudPilotTagScopedInstanceProfiles",
"Effect": "Allow",
"Action": [
"iam:TagInstanceProfile"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${CLUSTER_REGION}",
"aws:RequestTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${CLUSTER_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "CloudPilotManageScopedInstanceProfiles",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${CLUSTER_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "CloudPilotReadInstanceProfiles",
"Effect": "Allow",
"Action": "iam:GetInstanceProfile",
"Resource": "*"
}
]
}
EOFApply the Custom Roles
After the JSON files are generated, choose either the AWS CLI workflow or the AWS Console workflow.
Option 1: Apply with AWS CLI
This path is recommended because it uses AWS managed policies for the node role and a generated inline policy for the controller role.
aws iam update-assume-role-policy \
--role-name "$NODE_ROLE_NAME" \
--policy-document file://node-role-trust-policy.json
aws iam attach-role-policy --role-name "$NODE_ROLE_NAME" --policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
aws iam attach-role-policy --role-name "$NODE_ROLE_NAME" --policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKS_CNI_Policy"
aws iam attach-role-policy --role-name "$NODE_ROLE_NAME" --policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
aws iam attach-role-policy --role-name "$NODE_ROLE_NAME" --policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
aws iam update-assume-role-policy \
--role-name "$CONTROLLER_ROLE_NAME" \
--policy-document file://controller-role-trust-policy.json
aws iam put-role-policy \
--role-name "$CONTROLLER_ROLE_NAME" \
--policy-name CloudPilotControllerMinimumPolicy \
--policy-document file://controller-role-minimum-policy.jsonOption 2: Apply in AWS Console
Use the same generated files from the Prepare the Role Files section.
Node role
- Open
IAM->Roles-> your node role. - Open
Trust relationships->Edit trust policy. - Paste the content of
node-role-trust-policy.json, and save. - Open
Permissions. - Attach these AWS managed policies:
AmazonEKSWorkerNodePolicyAmazonEKS_CNI_PolicyAmazonEC2ContainerRegistryReadOnlyAmazonEBSCSIDriverPolicy
Controller role
- Open
IAM->Roles-> your controller role. - Open
Trust relationships->Edit trust policy. - Paste the content of
controller-role-trust-policy.json, and save. - Open
Permissions. - Create an inline policy from the content of
controller-role-minimum-policy.json.
Role Requirements Reference
Use this section to understand what the generated files and managed policies are meant to satisfy.
Node role requirements
The node role must trust EC2 and include the minimum permissions below.
| Requirement | Minimum permissions | Why CloudPilot needs it |
|---|---|---|
| Trust policy | sts:AssumeRole from ec2.amazonaws.com | Lets EC2 instances launched by CloudPilot assume the role |
| Cluster bootstrap | Permissions provided by AmazonEKSWorkerNodePolicy | Lets worker nodes discover cluster metadata and join the cluster |
| VPC CNI | Permissions provided by AmazonEKS_CNI_Policy | Lets the AWS VPC CNI manage ENIs and secondary IPs |
| ECR image pull | Permissions provided by AmazonEC2ContainerRegistryReadOnly | Lets nodes pull container images from ECR |
| EBS CSI | Permissions provided by AmazonEBSCSIDriverPolicy | Lets the EBS CSI driver manage EBS volumes used by workloads |
Controller role requirements
The controller role must trust the CloudPilot service account through the cluster OIDC provider and include the minimum permissions below.
| Requirement | Minimum permissions | Why CloudPilot needs it |
|---|---|---|
| Trust policy | sts:AssumeRoleWithWebIdentity from the cluster OIDC provider, restricted to system:serviceaccount:cloudpilot:cloudpilot-admin and aud=sts.amazonaws.com | Lets the CloudPilot controller assume the role through IRSA |
| Read-only cluster and autoscaling discovery | autoscaling:DescribeAutoScalingGroups, autoscaling:DescribeAutoScalingInstances, autoscaling:DescribeLaunchConfigurations, autoscaling:DescribeScalingActivities, autoscaling:DescribeTags, ec2:DescribeImages, ec2:DescribeInstanceTypes, ec2:DescribeLaunchTemplateVersions, ec2:GetInstanceTypesFromInstanceRequirements, eks:DescribeNodegroup, eks:DescribeCluster | Lets the controller inspect node groups, launch templates, instance types, and cluster metadata |
| Autoscaling mutations | autoscaling:SetDesiredCapacity, autoscaling:TerminateInstanceInAutoScalingGroup, autoscaling:UpdateAutoScalingGroup | Lets the controller rebalance existing node groups |
| EC2 provisioning | ssm:GetParameter, ec2:RunInstances, ec2:DescribeSubnets, ec2:DescribeSecurityGroups, ec2:DescribeLaunchTemplates, ec2:DescribeInstances, ec2:DescribeInstanceTypes, ec2:DescribeInstanceTypeOfferings, ec2:DescribeAvailabilityZones, ec2:DeleteLaunchTemplate, ec2:CreateTags, ec2:CreateLaunchTemplate, ec2:CreateFleet, ec2:DescribeSpotPriceHistory, pricing:GetProducts, savingsplans:DescribeSavingsPlans, ec2:DescribeRegions | Lets the controller calculate capacity options and create EC2 capacity |
| Terminate CloudPilot-managed nodes | ec2:TerminateInstances with ec2:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME} equal to owned or shared | Limits direct EC2 termination to cluster-owned/shared nodes |
| Pass node role | iam:PassRole on ${NODE_ROLE_ARN} | Lets the controller launch instances with the node IAM role |
| Instance profile lifecycle | iam:CreateInstanceProfile, iam:TagInstanceProfile, iam:AddRoleToInstanceProfile, iam:RemoveRoleFromInstanceProfile, iam:DeleteInstanceProfile, iam:GetInstanceProfile with the tag conditions used in the generated policy | Lets the controller manage Karpenter instance profiles safely inside the cluster scope |
Use Custom Roles During Installation, Migration, and Upgrade
After the custom roles have been prepared, export the CloudPilot installation variables and make CloudPilot use the custom role names.
export CUSTOM_NODE_ROLE="$NODE_ROLE_NAME"
export CUSTOM_CONTROLLER_ROLE="$CONTROLLER_ROLE_NAME"Scenario 1: Fresh install with custom roles
If the cluster has not run CloudPilot phase2 yet, export the variables above before running the phase2 install script.
Important notes:
- Run the role-preparation steps in the earlier sections first.
- Run phase1 before phase2, as usual.
- When
CUSTOM_NODE_ROLEandCUSTOM_CONTROLLER_ROLEare set, the installer validates those roles and uses them directly instead of creating and managing the default CloudPilot roles.
Scenario 2: Migrate an existing cluster from default roles to custom roles
If the cluster is already installed with the default CloudPilot roles, you can migrate it by re-running phase2 with the custom role variables exported.
This updates the phase2 installation to reference the custom roles. The script validates the custom roles but does not modify them.
Scenario 3: Use the upgrade script while upgrading to a newer version
If you are already planning to upgrade CloudPilot to a newer version, export the custom role variables before running the EKS upgrade script. When the upgrade script reaches the target version’s phase2 install step, that phase2 run will use the custom roles.
Important limitations:
upgrade.shonly runs phase2 when there is an actual version transition to apply.upgrade.shmay setUPDATE_AWS_RESOURCEautomatically from its internal version matrix when the user does not provide it.- For custom-role installation or migration, explicitly set
UPDATE_AWS_RESOURCE=trueso that phase2 also updates cluster access entries oraws-authmappings for the custom node role. - If the cluster is already on the latest target version and you only want to switch roles,
upgrade.shis not enough. - In that case, re-run the current version’s phase2 install script directly, as shown in Scenario 2.
Final step: update the NodeClass role in CloudPilot Console
After any of the scenarios above, update the NodeClass configuration in CloudPilot Console so newly provisioned EC2 nodes use the custom node role.
- Open CloudPilot Console and go to the cluster’s Node Autoscaler configuration.
- Open the NodeClass used by your NodePool.
- Set Role to your custom node role name, for example
clusterall-noderole. - Save the NodeClass.
What happens to the old default CloudPilot roles
After a successful migration, CloudPilot will start using the custom roles for future phase2 runs and controller/node provisioning flows.
The migration step does not automatically delete the old default CloudPilot IAM roles. If you want to remove the old default roles, do that only after you have confirmed:
- The controller is running with the custom controller role.
- New nodes launched by CloudPilot are using the custom node role.
- The cluster is healthy after the migration.
Validation Summary
When a custom role is provided, the installer validates:
- The role exists.
- The trust policy matches the required principal and conditions.
- The role can actually perform the minimum required actions.
- For the controller role,
iam:PassRoletargets the exact node role ARN that CloudPilot will use.
If any of these checks fail, the installer exits before changing the cluster installation state.