Normalization: 1NF, 2NF, and 3NF Explained with Examples
normalization, 1NF, 2NF, 3NF, functional dependency, partial dependency, transitive dependency, unnormalized form
Introduction
Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. It is one of the most important skills in Database Management Systems design. A poorly normalized database has duplicate data across rows, update anomalies (changing one value requires updating many rows), insertion anomalies (cannot add data without unrelated data), and deletion anomalies (deleting one piece of data inadvertently removes another). Normalization eliminates all of these problems.
Explanation
Normalization proceeds through normal forms — each form building on the previous. A table satisfies a normal form if it meets specific structural criteria. The three most important normal forms for practical database design are First (1NF), Second (2NF), and Third Normal Form (3NF). Higher forms (BCNF, 4NF, 5NF) exist for specific edge cases.
The key concept underlying normalization is the functional dependency: attribute B is functionally dependent on attribute A (written A → B) if knowing A's value always determines B's value. For example, knowing a student_id always determines the student's name.
Real World Example
An unnormalized order table might store: order_id, customer_name, customer_email, customer_city, product_name, product_category, quantity, price. This violates all normal forms — customer data is repeated in every order row, changing a customer's email requires updating every one of their order rows, and deleting the last order for a customer destroys the customer's data.
Technical Breakdown
First Normal Form (1NF):
All column values are atomic (no lists, sets, or arrays in a cell).
All rows are unique (there is a primary key).
Each column contains only one type of data.
Violation example: storing "phone: 9999999999, 8888888888" in one column.
Second Normal Form (2NF):
Must be in 1NF first.
No partial dependency — every non-key column must depend on the entire primary key, not just part of it.
Only relevant when the primary key is composite.
Violation example: In order_items(order_id, product_id, quantity, product_name), product_name depends only on product_id, not on the full composite key (order_id, product_id).
Third Normal Form (3NF):
Must be in 2NF first.
No transitive dependency — non-key columns must not depend on other non-key columns.
Violation example: employees(emp_id, dept_id, dept_name) — dept_name depends on dept_id, which is not the primary key. Solution: move dept_name to a departments table.
[ Normalization Steps 1NF to 3NF]
Prompt: Clean educational diagram showing three tables progressing from left to right. Left table: unnormalized order data with customer and product data all mixed together, labeled "Unnormalized". Middle table after splitting: two tables for 1NF/2NF with partial dependencies removed. Right: three normalized tables (customers, products, orders) each with clean columns and primary keys, labeled "3NF". Arrows show the transformation. White background, educational style.
Example
-- BEFORE normalization (violates 2NF and 3NF):
-- order_items(order_id, product_id, quantity,
-- product_name, category, -- partial dep on product_id
-- customer_name, customer_city) -- should not be here at all
-- AFTER normalization (3NF):
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
customer_email VARCHAR(150) UNIQUE,
city VARCHAR(100)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(200),
category_id INT REFERENCES categories(category_id),
price DECIMAL(10,2)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE
);
CREATE TABLE order_items (
order_id INT REFERENCES orders(order_id),
product_id INT REFERENCES products(product_id),
quantity INT,
unit_price DECIMAL(10,2), -- snapshot price at time of order
PRIMARY KEY (order_id, product_id)
);Common Mistakes
Over-normalizing — splitting tables beyond 3NF when there are no real anomaly problems introduces too many JOINs and harms query performance.
Not storing the price in order_items — if you only reference products.price, you lose the historical price at the time of the order when the price changes.
Confusing 2NF and 3NF — 2NF is about partial dependencies (requires composite key to be violated), 3NF is about transitive dependencies (non-key depending on non-key).
Interview Tips
The classic normalization interview question gives you an unnormalized table and asks you to decompose it to 3NF step by step. Always: (1) Identify functional dependencies, (2) Check for 1NF violations (repeating groups), (3) Check for partial dependencies (2NF), (4) Check for transitive dependencies (3NF). Show your reasoning at each step — interviewers evaluate your thought process as much as the final answer.
