Script Valley
Database Design: From Idea to Schema
Schema for ScaleLesson 5.4

Sharding concepts: designing a schema that can shard

sharding definition, shard key selection, hotspot problem, cross-shard queries, shard-friendly schema, tenant ID pattern, hash vs range sharding

What Is Sharding?

Sharding Architecture

Sharding distributes rows across multiple independent database servers (shards) using a shard key. Each shard holds a subset of the data. Unlike replicas, each shard handles both reads and writes for its subset.

Shard Key Selection

The shard key determines which shard a row lives on. A bad shard key creates hotspots (one shard overloaded) or makes most queries cross-shard.

  • Good key: user_id — distributes evenly if IDs are sequential and usage is similar per user.
  • Bad key: created_at — all new writes go to the latest shard (hotspot).
  • Bad key: country — the US shard gets 30× the traffic of others.

Shard-Friendly Schema Design

-- All tables carry tenant_id (shard key)
CREATE TABLE orders (
  order_id   BIGINT NOT NULL,
  tenant_id  INT NOT NULL,   -- shard key
  customer_id INT NOT NULL,
  total      DECIMAL(10,2),
  PRIMARY KEY (tenant_id, order_id)  -- shard key is part of PK
);

Keep related data on the same shard (co-locate user data, their orders, and their payments under the same user_id key). Cross-shard JOINs are either unsupported or extremely expensive — design to avoid them.

Up next

Optimizing schemas for read-heavy vs write-heavy workloads

Sign in to track progress