Schema for ScaleLesson 5.4
Sharding concepts: designing a schema that can shard
sharding definition, shard key selection, hotspot problem, cross-shard queries, shard-friendly schema, tenant ID pattern, hash vs range sharding
What Is Sharding?
Sharding distributes rows across multiple independent database servers (shards) using a shard key. Each shard holds a subset of the data. Unlike replicas, each shard handles both reads and writes for its subset.
Shard Key Selection
The shard key determines which shard a row lives on. A bad shard key creates hotspots (one shard overloaded) or makes most queries cross-shard.
- Good key:
user_id— distributes evenly if IDs are sequential and usage is similar per user. - Bad key:
created_at— all new writes go to the latest shard (hotspot). - Bad key:
country— the US shard gets 30× the traffic of others.
Shard-Friendly Schema Design
-- All tables carry tenant_id (shard key)
CREATE TABLE orders (
order_id BIGINT NOT NULL,
tenant_id INT NOT NULL, -- shard key
customer_id INT NOT NULL,
total DECIMAL(10,2),
PRIMARY KEY (tenant_id, order_id) -- shard key is part of PK
);Keep related data on the same shard (co-locate user data, their orders, and their payments under the same user_id key). Cross-shard JOINs are either unsupported or extremely expensive — design to avoid them.
