Scaling, Scheduling, and Resource ManagementLesson 5.1

Horizontal Pod Autoscaler: scaling Kubernetes workloads by CPU and memory

HPA resource, metrics-server requirement, targetCPUUtilizationPercentage, min and max replicas, scaling algorithm, scale-up vs scale-down behavior, custom metrics with KEDA, HPA v2 API

HPA Adjusts Replica Count Based on Load

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed metrics - most commonly CPU utilization. When load drops, it scales back down.

Prerequisites: Install Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify it's working
kubectl top nodes
kubectl top pods

Creating an HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70   # scale up when avg CPU > 70%

# Or quickly with kubectl
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10

# Check HPA status
kubectl get hpa
kubectl describe hpa web-hpa

HPA requires that Pods have CPU resource requests set - without them, the percentage calculation has no baseline. Scale-down is intentionally slow (5 minutes by default) to avoid thrashing.