Script Valley
Database Management Systems with SQL: Complete Course
Indexing and Query OptimizationLesson 7.2

Query Optimization: EXPLAIN Plans and Performance Tuning

query optimization, EXPLAIN, execution plan, full table scan, index scan, query rewriting, query performance, slow query log

Introduction

Query optimization is the process of finding the most efficient way to execute a SQL query. The query optimizer in every Database Management System automatically generates and evaluates multiple execution plans, choosing the one with the lowest estimated cost. Understanding how the optimizer works and how to read EXPLAIN output lets you diagnose and fix slow queries — a critical skill for production database performance.

Explanation

Without optimization, the same query can take seconds instead of milliseconds.The optimizer tries different ways to run your query and picks the fastest one.When you submit a SQL query, the DBMS parses it, generates multiple candidate execution plans (different join orders, different index choices), estimates the cost of each based on table statistics (row counts, value distributions, index selectivity), and executes the lowest-cost plan. The query optimizer uses statistics stored in the catalog — if statistics are stale, the optimizer makes poor choices.

Real World Example

A reporting query on an e-commerce database was timing out after 45 seconds. EXPLAIN showed a full table scan on the orders table (25 million rows) joined to a full scan on order_items (80 million rows). Adding a composite index on orders(status, created_at) and ensuring order_items.order_id was indexed reduced execution time to 120 milliseconds — a 375x improvement without changing any application code.

Technical Breakdown

Key EXPLAIN output fields to understand:

  • type (join type): From worst to best: ALL (full table scan), index (full index scan), range (index range scan), ref (index lookup by non-unique key), eq_ref (index lookup by unique key), const/system (single row lookup). ANY query with type=ALL on a large table is a red flag.

  • rows: Estimated number of rows examined. A high number indicates potential for optimization.

  • key: Which index was chosen (NULL means no index used).

  • Extra: Additional information — "Using filesort" means an in-memory sort (avoid for large datasets), "Using temporary" means a temporary table was created, "Using index" means a covering index was used (best case).

Example

-- Step 1: Identify the slow query
-- Enable slow query log (MySQL):
-- SET GLOBAL slow_query_log = ON;
-- SET GLOBAL long_query_time = 1;  -- queries over 1 second

-- Step 2: Analyze with EXPLAIN
EXPLAIN SELECT c.name, SUM(o.total_amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
  AND o.status = 'completed'
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC
LIMIT 20;

-- Step 3: Interpret output — look for type='ALL' on large tables
-- Step 4: Add missing indexes
CREATE INDEX idx_orders_date_status ON orders(order_date, status);

-- Step 5: Re-run EXPLAIN and verify improvement

-- Common optimization: avoid functions on indexed columns
-- BAD (cannot use index on created_at):
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- GOOD (uses index range scan):
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01';

-- EXPLAIN ANALYZE (PostgreSQL) — shows actual vs estimated rows
EXPLAIN ANALYZE
SELECT product_id, COUNT(*) AS order_count
FROM order_items
GROUP BY product_id
HAVING COUNT(*) > 100;

Common Mistakes

  • Running EXPLAIN without ANALYZE — EXPLAIN shows estimated costs; EXPLAIN ANALYZE (PostgreSQL) shows actual execution times. A large discrepancy means statistics need updating (ANALYZE TABLE in MySQL, ANALYZE in PostgreSQL).

  • Optimizing queries that run once — focus optimization effort on queries that run frequently or on large tables. A 10x improvement on a query that runs 1000 times per second matters far more than a 100x improvement on a monthly report.

  • Adding indexes without testing write performance — each additional index adds overhead to every INSERT/UPDATE/DELETE. Always benchmark write performance after adding indexes in a staging environment.

Interview Tips

Senior engineering interviews often include a slow query diagnosis question. The expected answer follows this process: (1) Run EXPLAIN to see the execution plan, (2) Identify type=ALL on large tables as full table scans, (3) Check if appropriate indexes exist using SHOW INDEX, (4) Consider query rewriting to be index-friendly, (5) Consider covering indexes to eliminate table row lookups. Mentioning query hints, partitioning for very large tables, and connection pooling for application-level optimization demonstrates senior-level database knowledge.