SQL Joins

PostgreSQL Joins
Table of contents
  1. CROSS JOIN
  2. Inner vs Outer Joins
  3. Join Conditions
    1. ON
      1. Equi-Join
    2. USING
  4. JOIN / INNER JOIN
  5. Outer Joins
    1. Exclusive Outer Joins
  6. NATURAL Joins
  7. SELF JOIN

CROSS JOIN

A CROSS JOIN returns the Cartesian product of two tables, creating n * m rows where n and m are the number of rows in each table.

SELECT *
FROM a CROSS JOIN b;

Or you could just use a comma ,:

SELECT *
FROM a, b;

Inner vs Outer Joins

Theoretically, you could think of all joins starting from the Cartesian product.

Not exactly what the query optimizer does, but it helps to understand the concept.

After obtaining the Cartesian product, we filter out rows based on some conditions:

  • Inner Join: Returns only the rows that fulfill the condition.
  • Outer Join: Returns the rows that fulfill the condition and
    • Left Outer Join: Also the rows with values from the left table
    • Right Outer Join: Also the rows with values from the right table
    • Full Outer Join: Also the rows with values from both tables

For example, left outer join has the following logic:

  1. We have a Cartesian product of two tables.
  2. The rows that fulfill the condition are kept.
  3. In addition, find all rows from the left table (not the Cartesian product) that did not make it to the final result.
  4. For each of those rows, fill in NULL values for the right table columns, and include them in the final result.

Join Conditions

ON

With ON, you can use any comparison operators to join tables =, >, <, >=, <=, <>, !=, etc.

Equi-Join

Equi-join is the most common type of join. It is when a join condition uses the = operator.

SELECT s.store_id, a.address, district
FROM store s
     JOIN address a ON s.address_id = a.address_id;

Range variable s and a are technically optional, but they are highly recommended and can become necessary to resolve ambiguity when referencing column names that exist in both tables.

You can equi-join on multiple columns as well with AND:

SELECT *
FROM a
     JOIN b ON a.key1 = b.key1 AND a.key2 = b.key2;

USING

When you want to equi-join on columns with the same name, you can use the shorthand USING keyword:

SELECT s.store_id, a.address, district
FROM store s
     JOIN address a USING (address_id);

Do not forget the parentheses () around the key name.


JOIN / INNER JOIN

In PostgreSQL, when you use JOIN, it is equivalent to INNER JOIN.

Only returns rows that fulfill the condition.

SELECT *
FROM a
     [INNER] JOIN b ON condition;

Outer Joins

In keywords, the word OUTER is omitted:

  • LEFT JOIN
  • RIGHT JOIN
  • FULL JOIN

Remember, for dangling left or right rows, NULL values are filled in for the columns from the other table.

Exclusive Outer Joins

To filter out rows that have matching values, you can add a NULL check in the WHERE clause.

For exclusive left join:

SELECT *
FROM a
     LEFT JOIN b USING (key)
WHERE b.key IS NULL;

For exclusive right join:

SELECT *
FROM a
     RIGHT JOIN b USING (key)
WHERE a.key IS NULL;

For exclusive full join:

SELECT *
FROM a
     FULL JOIN b USING (key)
WHERE a.key IS NULL OR b.key IS NULL;

NATURAL Joins

Natural join is an equi-join on columns with the same name.

NATURAL is a keyword that can come before INNER, LEFT, or RIGHT, or simply JOIN.

SELECT *
FROM a
NATURAL [INNER | LEFT | RIGHT] JOIN b;

For NATURAL JOIN, the columns with the same name are automatically matched. Joins with USING can be simplified with NATURAL JOIN.

The problem is, if there are multiple columns with the same name, natural join will try to find a match across all those columns.

In other words, SELECT * FROM a NATURAL JOIN b with two matching column names key1 and key2 is equivalent to:

SELECT *
FROM a
     JOIN b
     ON a.key1 = b.key1 AND a.key2 = b.key2;

Commonly used column names such as created_time or last_updated can easily mess up natural join and lead to empty results. It is not recommended to use NATURAL JOIN because it can lead to ambiguity.


SELF JOIN

Self join is simply joining a table with itself.

The only caveat is that aliasing becomes mandatory.

SELECT *
FROM tbl t1
     JOIN tbl t2 ON t1.key = t2.key;