Schema for ScaleLesson 5.3

Archiving old data: keeping production tables lean

data archiving strategy, archive table pattern, cutoff date, incremental archive job, hard delete after archive, cold storage, pg_partman, event sourcing reference

Why Archive

Production tables with years of data become slow to query, slow to back up, and expensive to index. Archiving moves old, rarely accessed rows to a separate archive table (or cold storage), keeping the live table small and fast.

Archive Table Pattern

-- Archive table mirrors the live table
CREATE TABLE orders_archive LIKE orders;

-- Archive job: move rows older than 2 years
INSERT INTO orders_archive
  SELECT * FROM orders
  WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
    AND status IN ('completed', 'cancelled');

DELETE FROM orders
  WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
    AND status IN ('completed', 'cancelled');

Principles

Archive in small batches (1000–10000 rows) to avoid locking.
Only archive rows in terminal states (completed, cancelled) - never active records.
Run the archive job during off-peak hours.
Keep the archive table in the same database for fast joins; move to cold storage (S3, BigQuery) once you no longer need real-time queries against it.

Partitioning and archiving complement each other: partition by date, then drop old partitions - which is an O(1) operation regardless of row count - instead of running expensive DELETE queries.