Scaling, Scheduling, and Resource ManagementLesson 5.1
Horizontal Pod Autoscaler: scaling Kubernetes workloads by CPU and memory
HPA resource, metrics-server requirement, targetCPUUtilizationPercentage, min and max replicas, scaling algorithm, scale-up vs scale-down behavior, custom metrics with KEDA, HPA v2 API
HPA Adjusts Replica Count Based on Load
The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed metrics โ most commonly CPU utilization. When load drops, it scales back down.
Prerequisites: Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify it's working
kubectl top nodes
kubectl top podsCreating an HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when avg CPU > 70%# Or quickly with kubectl
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10
# Check HPA status
kubectl get hpa
kubectl describe hpa web-hpaHPA requires that Pods have CPU resource requests set โ without them, the percentage calculation has no baseline. Scale-down is intentionally slow (5 minutes by default) to avoid thrashing.
