Hierarchical Query

What is a hierarchical query?

In a company, employees are organized in a hierarchical structure. Each employee has a manager, except maybe the CEO.

Suppose we have an employee table with employee_id and manager_id columns which reflects this hierarchy of the company.

Querying the names of all employees and their managers is a common hierarchical query.

Our first intuition would be to use self-join:

SELECT e1.first_name emp,
       e1.job_id emp_job,
       'Reports to' what,
       e2.first_name mgr,
       e2.job_id mgr_job
FROM hr.employees e1
     LEFT JOIN hr.employees e2
               ON e1.manager_id = e2.employee_id
ORDER BY e1.employee_id;
EMPEMP_JOBWHATMGRMGR_JOB
StevenAD_PRESReports to--
NeenaAD_VPReports toStevenAD_PRES
LexAD_VPReports toStevenAD_PRES
AlexanderIT_PROGReports toLexAD_VP
BruceIT_PROGReports toAlexanderIT_PROG
DavidIT_PROGReports toAlexanderIT_PROG
ValliIT_PROGReports toAlexanderIT_PROG
DianaIT_PROGReports toAlexanderIT_PROG

But self-join is not very query efficient, and we’d like to use a more optimized method.

Table of contents
  1. Oracle CONNECT BY
    1. Starting Point
    2. Top-Down vs Bottom-Up
    3. Pruning Branches in Top-Down
  2. PostgreSQL WITH RECURSIVE

Oracle CONNECT BY

Oracle provides the CONNECT BY clause to query hierarchical data.

SELECT first_name emp_name,
       job_id emp_job,
       PRIOR first_name mgr_name,
       PRIOR job_id mgr_job
FROM hr.employees
START WITH manager_id IS NULL  -- Start with the President
CONNECT BY PRIOR employee_id = manager_id;

Starting Point

START WITH is used to specify the root of the hierarchy.

In our example, that would be the President, who has no manager, hence manager_id IS NULL.

Top-Down vs Bottom-Up

Depending on whether the connection was top-down or bottom-up in the hierarchy tree, the PRIOR keyword refers to either the parent or child.

In our example above, our prior is the manager (parent) of the current employee, which was imposed by:

START WITH manager_id IS NULL  -- Root was the President
CONNECT BY PRIOR employee_id = manager_id  -- From the root, connect to child

It is equivalent to saying use the parent’s employee_id to connect to the child’s manager_id.

This is a top-down approach.

A bottom-up approach would be to start with some employee and find all the managers up to the President.

START WITH employee_id = 206  -- Start with some employee
CONNECT BY PRIOR manager_id = employee_id;  -- His manager_id will be used to connect to parent
Full Query
SELECT prior first_name emp_name,
       prior job_id emp_job,
       first_name mgr_name,
       job_id mgr_job,
       LEVEL
FROM hr.employees
START WITH employee_id = 206
CONNECT BY PRIOR manager_id = employee_id;

LEVEL is a special keyword that returns the level of the hierarchy, with starting node as level 1.

Pruning Branches in Top-Down

To prune branches in a top-down hierarchy, place additional conditions in the CONNECT BY clause:

SELECT first_name emp_name,
       job_id emp_job,
       PRIOR first_name mgr_name,
       PRIOR job_id mgr_job
FROM hr.employees
START WITH manager_id IS NULL -- Start with the President
CONNECT BY PRIOR employee_id = manager_id
       AND job_id != 'IT_PROG';  -- Exclude all child branches with IT_PROG job

Putting the condition in WHERE does not have the effect of pruning branches, it merely filters the nodes after the hierarchy is built.


PostgreSQL WITH RECURSIVE

To be added

CONNECT BY is Oracle-specific, but a similar hierarchical query can be achieved in PostgreSQL using WITH RECURSIVE.