Advanced SQLLesson 4.1

GROUP BY, HAVING, and Aggregate Functions

GROUP BY, HAVING, COUNT, SUM, AVG, MAX, MIN, aggregate functions, grouping queries

Introduction

Aggregate functions and GROUP BY transform raw row-level data into meaningful summaries — total sales per region, average order value per customer, count of products per category. These are among the most frequently used features in analytical Database Management Systems queries. Every report, dashboard, and business intelligence tool relies on these capabilities.

Explanation

Aggregate functions operate on a set of rows and return a single value. The five standard aggregate functions are COUNT (count rows or non-NULL values), SUM (total of numeric values), AVG (arithmetic mean), MAX (highest value), and MIN (lowest value). When used with GROUP BY, these functions are applied independently to each group rather than the entire table.

The HAVING clause filters groups after aggregation — it is the WHERE clause for groups. Because HAVING executes after GROUP BY and SELECT, it can reference aggregate function results and column aliases defined in SELECT.

Real World Example

A sales analytics team needs to answer: "Which product categories generated more than 1 million rupees in revenue this quarter, and what is the average order size per category?"

This requires grouping orders by category, summing revenue, calculating averages, and filtering by a revenue threshold — a perfect use case for GROUP BY with aggregate functions and HAVING.

Technical Breakdown

Key rules for GROUP BY queries:

Every column in the SELECT clause must either be in the GROUP BY clause or wrapped in an aggregate function.
NULL values in the grouping column form their own group.
HAVING can reference aggregate functions; WHERE cannot.
GROUP BY processes before SELECT, so SELECT aliases cannot be used in HAVING in most databases (use the aggregate expression directly).
COUNT(*) counts all rows including NULLs. COUNT(column) counts only non-NULL values in that column.

Data → GROUP BY → Aggregate → HAVING → Final Result

Example

-- Sales by category with revenue filter
SELECT
    p.category,
    COUNT(DISTINCT o.order_id)    AS total_orders,
    SUM(oi.quantity * oi.unit_price) AS total_revenue,
    AVG(oi.unit_price)            AS avg_product_price,
    MAX(oi.unit_price)            AS highest_price
FROM order_items oi
INNER JOIN products p ON oi.product_id = p.product_id
INNER JOIN orders o   ON oi.order_id   = o.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY p.category
HAVING SUM(oi.quantity * oi.unit_price) > 1000000
ORDER BY total_revenue DESC;

-- Count customers per city with minimum 5 customers
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) >= 5
ORDER BY customer_count DESC;

Common Mistakes

Including non-aggregated columns in SELECT that are not in GROUP BY — this causes an error in strict SQL modes and returns unpredictable results in lenient modes.
Using WHERE to filter on aggregate results instead of HAVING — WHERE fires before aggregation so aggregate values do not exist yet.
Confusing COUNT(*) and COUNT(column) — COUNT(*) counts all rows; COUNT(column) skips NULL values in that column.

Interview Tips

Practice explaining the difference between WHERE and HAVING with examples. A classic interview question is "Write a query to find all customers who have placed more than 5 orders" — this requires GROUP BY customer_id, COUNT(order_id) > 5 in HAVING. Also know that you can GROUP BY multiple columns: GROUP BY year, month gives monthly aggregates.