Skip to Content
GuideTipsBest Practices for Ensuring Service Availability with AWS ALB

Best Practices for Ensuring Service Availability with AWS ALB

In an AWS EKS environment, using an ALB may lead to service downtime. The issue typically arises as follows:

problem-img

Time Point 1: When Kubernetes decides to terminate a pod, it sends a TERM signal. The pod may stop immediately or after completing all inflight requests, at which point its status is marked as Terminating.

Time Point 2: The ALB (Application Load Balancer) controller detects that the pod’s status is Terminating and attempts to remove the pod’s IP from the ALB.

Time Point 3: The ALB controller calls the appropriate API to remove the pod’s IP from the ALB.

During this process, if any new requests arrive at the pod during Time Period 4, those requests will fail. This is a common issue, as highlighted in a related GitHub issue.

Key Insight: The TERM signal should only be sent after the pod’s IP has been removed from the ALB.

Proposed Solution: While this solution may not be perfect, it is a recommended practice. By using a preStop lifecycle hook in the deployment, the TERM signal can be delayed by 5 seconds, allowing sufficient time for the operations at Time Points 2 and 3 to complete.

apiVersion: apps/v1 kind: Deployment metadata: namespace: game-2048 name: deployment-2048 spec: selector: matchLabels: app.kubernetes.io/name: app-2048 replicas: 1 template: metadata: labels: app.kubernetes.io/name: app-2048 spec: containers: - image: public.ecr.aws/l6m2t8p7/docker-2048:latest imagePullPolicy: Always name: app-2048 ports: - containerPort: 80 lifecycle: preStop: exec: command: ["/bin/sh","-c","sleep 5"]
Last updated on