Script Valley
Database Design: From Idea to Schema
Schema for ScaleLesson 5.3

Archiving old data: keeping production tables lean

data archiving strategy, archive table pattern, cutoff date, incremental archive job, hard delete after archive, cold storage, pg_partman, event sourcing reference

Why Archive

Data Archiving Flow

Production tables with years of data become slow to query, slow to back up, and expensive to index. Archiving moves old, rarely accessed rows to a separate archive table (or cold storage), keeping the live table small and fast.

Archive Table Pattern

-- Archive table mirrors the live table
CREATE TABLE orders_archive LIKE orders;

-- Archive job: move rows older than 2 years
INSERT INTO orders_archive
  SELECT * FROM orders
  WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
    AND status IN ('completed', 'cancelled');

DELETE FROM orders
  WHERE created_at < DATE_SUB(CURRENT_DATE, INTERVAL 2 YEAR)
    AND status IN ('completed', 'cancelled');

Principles

  • Archive in small batches (1000–10000 rows) to avoid locking.
  • Only archive rows in terminal states (completed, cancelled) — never active records.
  • Run the archive job during off-peak hours.
  • Keep the archive table in the same database for fast joins; move to cold storage (S3, BigQuery) once you no longer need real-time queries against it.

Partitioning and archiving complement each other: partition by date, then drop old partitions — which is an O(1) operation regardless of row count — instead of running expensive DELETE queries.

Up next

Sharding concepts: designing a schema that can shard

Sign in to track progress