SQL Query Basics

Examples are written with PostgreSQL dialect.

Table of contents
  1. Overall Order of Clauses
  2. SELECT Clause
    1. SELECT DISTINCT
  3. Range Variables / Tuple Variables
  4. Column Aliases
    1. Using Arithmetic Expressions as Column
  5. LIMIT Clause
  6. ORDER BY
  7. WHERE Clause
    1. Comparison Operators
    2. Logical Operators
      1. LIKE
      2. ILIKE
      3. REGEXP_LIKE
      4. AND, OR, NOT
      5. BETWEEN
      6. IS NULL
  8. Set Operators
    1. Set Comparison Operators
      1. IN / NOT IN
      2. EXISTS / NOT EXISTS
      3. ANY
      4. ALL
  9. Aggregate Functions
  10. GROUP BY Clause
    1. HAVING Clause
    2. Caveats with LIMIT
  11. CASE Expressions
    1. With GROUP BY
    2. In Aggregate Functions

Overall Order of Clauses

The overall clause order is:

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY
  7. LIMIT

SELECT Clause

SELECT target_list
FROM relations
WHERE qualification

SELECT DISTINCT

SELECT DISTINCT release_year, rating
FROM film;

When multiple columns are selected, SELECT DISTINCT applies across all those columns.

SELECT DISTINCT does not rule out NULL value. It includes NULL value in the result set.


Range Variables / Tuple Variables

Basically, table aliases.

It is optional in unambiguous situations, but considered good practice to use them.

SELECT f.title
FROM film AS f;

Also, AS is optional, but I prefer to use it for clarity:

SELECT f.title
FROM film f;

Column Aliases

SELECT DISTINCT release_year AS year, rating r
FROM film;

Again, AS is optional.

Aliasing with =

Not many RDBMS support using assignment operator = for aliasing, but it is a thing…

SELECT year=release_year
...

To use keywords, special characters, or whitespace in aliases, you need to enclose the alias in double quotes:

SELECT something AS "this is a column"

Using Arithmetic Expressions as Column

SELECT length / 60 AS hours, length % 60 AS minutes
FROM film;

LIMIT Clause

SELECT title
FROM film
LIMIT 5;

ORDER BY

SELECT title, length
FROM film
ORDER BY length DESC, title;

Ordered by length in descending order, then by title in ascending order.

In PostgreSQL, default is ASC.

You can actually use column numbers in ORDER BY:

SELECT title, length
FROM film
ORDER BY 2 DESC, 1 ASC;

These numbers refer to the order of columns in SELECT, starting from 1.

When to use column numbers?

It can make the query less readable, but it can be useful in situations where the column names are too verbose or subject to change.

Technically, you should alias columns of complex expressions, but if you left them anonymous, you could use numbers.

Example with CASE:

SELECT title,
       CASE
           WHEN length >= 120 THEN 'LONG'
           ELSE 'SHORT' END
FROM film
ORDER BY 2;

Without the numbers and alias, you’d have to repeat the CASE expression.


WHERE Clause

Logic in SQL is actually three-valued:

  • TRUE
  • FALSE
  • UNKNOWN

The UNKNOWN value is returned when comparing or performing arithmetic operations with NULL.

WHERE filters rows that do not evaluate to TRUE. So NULL values are also filtered out.

Comparison Operators

  • = Equal

    SELECT title
    FROM film
    WHERE rating = 'PG-13';
    

    Use single quotes for text.

  • <> or != Not equal

    SELECT title
    FROM film
    WHERE rating <> 'R';
    
  • >, >=, <, <=

    SELECT title
    FROM film
    WHERE length > 120;
    

    With non-numeric data, comparison is lexicographical.

Logical Operators

LIKE

String matching with wildcards.

  • _ matches any single character.
  • % matches 0 or more arbitrary characters.
SELECT title
FROM film
where title LIKE 'KI_________%';

You can enforce certain number of characters with _.

LIKE is case-sensitive in PostgreSQL.

ILIKE

Case-insensitive version of LIKE.

SELECT title
FROM film
WHERE title ILIKE 'ki_________%';

REGEXP_LIKE

SELECT title
FROM film
WHERE REGEXP_LIKE(title, '[A-Z]')
LIMIT 10;

AND, OR, NOT

SELECT title
FROM film
WHERE rating = 'PG-13' AND length > 120;

NOT

  • IS NOT NULL
  • NOT IN
  • NOT BETWEEN
  • NOT LIKE

BETWEEN

SELECT title
FROM film
WHERE length BETWEEN 120 AND 150;

IS NULL

SELECT title
FROM film
WHERE original_language_id IS NULL;

Set Operators

By default, set operators UNION, INTERSECT, and EXCEPT remove duplicates.

UNION ALL, INTERSECT ALL, and EXCEPT ALL retain duplicates.

SELECT cols
FROM table1
UNION [UNION ALL / INTERSECT / INTERSECT ALL / EXCEPT / EXCEPT ALL]
SELECT cols
FROM table1

The SELECT statements must be union-compatible.

Set Comparison Operators

IN / NOT IN

Unlike the other ones, you can use IN with an expression list:

SELECT title
FROM film
WHERE rating IN ('PG', 'PG-13');

Or with a subquery like the others.

EXISTS / NOT EXISTS

SELECT f.title
FROM film AS f
WHERE EXISTS (
    SELECT *
    FROM language AS l
    WHERE l.language_id = f.language_id AND l.name = 'English'
);

This is an awkward example, but you get the idea. EXISTS returns TRUE if the subquery returns any rows.

It is often used with correlated subqueries. By correlated, I mean the subquery refers to the relation in the outer query, which is f in this case.

ANY

ANY is used with leading comparison operators: op ANY (= ANY, <> ANY, <= ANY, etc.).

The following uses = ANY:

SELECT length
FROM film
WHERE length = ANY (
    SELECT length
    FROM film
    WHERE length = 60 OR length = 120
);

Semantically, = ANY is equivalent to IN. However, you cannot use expression lists with = ANY.

ALL

ALL is used with leading comparison operators: op ALL (= ALL, <> ALL, <= ALL, etc.).

The following uses <> ALL:

SELECT length
FROM film
WHERE length <> ALL (
    SELECT length
    FROM film
    WHERE length = 60 OR length = 120
);

Semantically, <> ALL is equivalent to NOT IN. However, you cannot use expression lists with <> ALL.


Aggregate Functions

You can only use aggregate functions in SELECT and HAVING clauses.

Except for COUNT(*), aggregate functions ignore NULL values.

Except for COUNT(*), aggregate functions take a single column.

COUNT(*) / COUNT(col) / COUNT(DISTINCT col)
SUM(col) / SUM(DISTINCT col)
AVG(col) / AVG(DISTINCT col)
MAX(col)
MIN(col)

DISTINCT does not ignore NULL values, but the aggregate function does.


GROUP BY Clause

To use aggregate functions across only certain groups of rows, we use GROUP BY clause.

SELECT rating, COUNT(title), AVG(length)
from film
GROUP BY rating;
ratingcountavg
R195118.6615384615384615
PG194112.0051546391752577
G178111.050561797752809
PG-13223120.4439461883408072
NC-17210113.2285714285714286

You can also group by multiple columns:

SELECT rating, category_id, COUNT(*) AS count
FROM film
     JOIN film_category USING (film_id)
GROUP BY rating, category_id;

GROUP BY col1, col2 means group rows by unique joints of col1 and col2.

Which also means ordering of columns in GROUP BY does not matter, in terms of results (may or may not affect performance).

If you specified two columns in GROUP BY, and each column has n and m unique values, the result will have n * m rows.

Non-aggregated columns in SELECT must appear in GROUP BY. In this example, rating and category_id.

Just like ORDER BY, you can use column numbers in GROUP BY. See Using Column Numbers.

HAVING Clause

WHERE filters rows before grouping, HAVING filters groups after grouping.

SELECT rating, category_id, COUNT(*) AS count
FROM film
     JOIN film_category USING (film_id)
GROUP BY category_id, rating
HAVING COUNT(*) >= 15;

Other than aggregate functions, you can also have subqueries in HAVING clause.

Caveats with LIMIT

LIMIT is applied after GROUP BY and HAVING.

So LIMIT is applied to the result set, after grouping, filtering, and aggregation.

You should really think about whether you’d actually want to use LIMIT with GROUP BY, because it often doesn’t make sense.


CASE Expressions

Basic syntax:

CASE WHEN condition1 THEN true_case_1
     WHEN condition2 THEN true_case_2
     ...
     ELSE false_case
END

In SELECT statement:

SELECT title,
       CASE
           WHEN length >= 120 THEN 'LONG'
           ELSE 'SHORT' END AS is_long
FROM film;

With GROUP BY

You can use CASE expressions in GROUP BY to create custom groups; basically preprocess group labels to deal with NULL values, etc. before aggregation.

SELECT CASE
           WHEN last_name LIKE 'A%' THEN 'A'
           WHEN last_name LIKE 'B%' THEN 'B'
           WHEN last_name LIKE 'C%' THEN 'C'
           ELSE 'Other' END AS last_name_starts_with,
       COUNT(*) AS actor_count
FROM actor
GROUP BY 1
ORDER BY 2;

In Aggregate Functions

You can use CASE expressions in aggregate functions to filter or transform values before aggregation:

SELECT COUNT(CASE WHEN length >= 120 THEN 1 ELSE NULL END) AS long_film_cnt
FROM film;

Here we used the fact that aggregate functions except COUNT(*) ignore NULL values.