Script Valley
Kubernetes: From Containers to Clusters
Scaling, Scheduling, and Resource ManagementLesson 5.1

Horizontal Pod Autoscaler: scaling Kubernetes workloads by CPU and memory

HPA resource, metrics-server requirement, targetCPUUtilizationPercentage, min and max replicas, scaling algorithm, scale-up vs scale-down behavior, custom metrics with KEDA, HPA v2 API

HPA Adjusts Replica Count Based on Load

Kubernetes HPA scaling decision diagram

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed metrics โ€” most commonly CPU utilization. When load drops, it scales back down.

Prerequisites: Install Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify it's working
kubectl top nodes
kubectl top pods

Creating an HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70   # scale up when avg CPU > 70%
# Or quickly with kubectl
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10

# Check HPA status
kubectl get hpa
kubectl describe hpa web-hpa

HPA requires that Pods have CPU resource requests set โ€” without them, the percentage calculation has no baseline. Scale-down is intentionally slow (5 minutes by default) to avoid thrashing.

Up next

Kubernetes node selectors and node affinity: placing pods on specific nodes

Sign in to track progress

Horizontal Pod Autoscaler: scaling Kubernetes workloads by CPU and memory โ€” Scaling, Scheduling, and Resource Management โ€” Kubernetes: From Containers to Clusters โ€” Script Valley โ€” Script Valley