Script Valley
Database Management Systems with SQL: Complete Course
Normalization and Database DesignLesson 5.1

Normalization: 1NF, 2NF, and 3NF Explained with Examples

normalization, 1NF, 2NF, 3NF, functional dependency, partial dependency, transitive dependency, unnormalized form

Introduction

Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. It is one of the most important skills in Database Management Systems design. A poorly normalized database has duplicate data across rows, update anomalies (changing one value requires updating many rows), insertion anomalies (cannot add data without unrelated data), and deletion anomalies (deleting one piece of data inadvertently removes another). Normalization eliminates all of these problems.

Explanation

Normalization proceeds through normal forms — each form building on the previous. A table satisfies a normal form if it meets specific structural criteria. The three most important normal forms for practical database design are First (1NF), Second (2NF), and Third Normal Form (3NF). Higher forms (BCNF, 4NF, 5NF) exist for specific edge cases.

The key concept underlying normalization is the functional dependency: attribute B is functionally dependent on attribute A (written A → B) if knowing A's value always determines B's value. For example, knowing a student_id always determines the student's name.

Real World Example

An unnormalized order table might store: order_id, customer_name, customer_email, customer_city, product_name, product_category, quantity, price. This violates all normal forms — customer data is repeated in every order row, changing a customer's email requires updating every one of their order rows, and deleting the last order for a customer destroys the customer's data.

Technical Breakdown

First Normal Form (1NF):

  • All column values are atomic (no lists, sets, or arrays in a cell).

  • All rows are unique (there is a primary key).

  • Each column contains only one type of data.

  • Violation example: storing "phone: 9999999999, 8888888888" in one column.

Second Normal Form (2NF):

  • Must be in 1NF first.

  • No partial dependency — every non-key column must depend on the entire primary key, not just part of it.

  • Only relevant when the primary key is composite.

  • Violation example: In order_items(order_id, product_id, quantity, product_name), product_name depends only on product_id, not on the full composite key (order_id, product_id).

Third Normal Form (3NF):

  • Must be in 2NF first.

  • No transitive dependency — non-key columns must not depend on other non-key columns.

  • Violation example: employees(emp_id, dept_id, dept_name) — dept_name depends on dept_id, which is not the primary key. Solution: move dept_name to a departments table.

[ Normalization Steps 1NF to 3NF]

Prompt: Clean educational diagram showing three tables progressing from left to right. Left table: unnormalized order data with customer and product data all mixed together, labeled "Unnormalized". Middle table after splitting: two tables for 1NF/2NF with partial dependencies removed. Right: three normalized tables (customers, products, orders) each with clean columns and primary keys, labeled "3NF". Arrows show the transformation. White background, educational style.

Example

-- BEFORE normalization (violates 2NF and 3NF):
-- order_items(order_id, product_id, quantity,
--             product_name, category,      -- partial dep on product_id
--             customer_name, customer_city) -- should not be here at all

-- AFTER normalization (3NF):
CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(150) UNIQUE,
    city          VARCHAR(100)
);

CREATE TABLE products (
    product_id   INT PRIMARY KEY,
    product_name VARCHAR(200),
    category_id  INT REFERENCES categories(category_id),
    price        DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date  DATE
);

CREATE TABLE order_items (
    order_id   INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity   INT,
    unit_price DECIMAL(10,2),  -- snapshot price at time of order
    PRIMARY KEY (order_id, product_id)
);

Common Mistakes

  • Over-normalizing — splitting tables beyond 3NF when there are no real anomaly problems introduces too many JOINs and harms query performance.

  • Not storing the price in order_items — if you only reference products.price, you lose the historical price at the time of the order when the price changes.

  • Confusing 2NF and 3NF — 2NF is about partial dependencies (requires composite key to be violated), 3NF is about transitive dependencies (non-key depending on non-key).

Interview Tips

The classic normalization interview question gives you an unnormalized table and asks you to decompose it to 3NF step by step. Always: (1) Identify functional dependencies, (2) Check for 1NF violations (repeating groups), (3) Check for partial dependencies (2NF), (4) Check for transitive dependencies (3NF). Show your reasoning at each step — interviewers evaluate your thought process as much as the final answer.