Hierarchical Query
What is a hierarchical query?
In a company, employees are organized in a hierarchical structure. Each employee has a manager, except maybe the CEO.
Suppose we have an employee table with employee_id and manager_id columns which reflects this hierarchy of the company.
Querying the names of all employees and their managers is a common hierarchical query.
Our first intuition would be to use self-join:
SELECT e1.first_name emp,
e1.job_id emp_job,
'Reports to' what,
e2.first_name mgr,
e2.job_id mgr_job
FROM hr.employees e1
LEFT JOIN hr.employees e2
ON e1.manager_id = e2.employee_id
ORDER BY e1.employee_id;
EMP | EMP_JOB | WHAT | MGR | MGR_JOB |
---|---|---|---|---|
Steven | AD_PRES | Reports to | - | - |
Neena | AD_VP | Reports to | Steven | AD_PRES |
Lex | AD_VP | Reports to | Steven | AD_PRES |
Alexander | IT_PROG | Reports to | Lex | AD_VP |
Bruce | IT_PROG | Reports to | Alexander | IT_PROG |
David | IT_PROG | Reports to | Alexander | IT_PROG |
Valli | IT_PROG | Reports to | Alexander | IT_PROG |
Diana | IT_PROG | Reports to | Alexander | IT_PROG |
But self-join is not very query efficient, and we’d like to use a more optimized method.
Table of contents
Oracle CONNECT BY
Oracle provides the CONNECT BY
clause to query hierarchical data.
SELECT first_name emp_name,
job_id emp_job,
PRIOR first_name mgr_name,
PRIOR job_id mgr_job
FROM hr.employees
START WITH manager_id IS NULL -- Start with the President
CONNECT BY PRIOR employee_id = manager_id;
Starting Point
START WITH
is used to specify the root of the hierarchy.
In our example, that would be the President, who has no manager, hence manager_id IS NULL
.
Top-Down vs Bottom-Up
Depending on whether the connection was top-down or bottom-up in the hierarchy tree, the PRIOR
keyword refers to either the parent or child.
In our example above, our prior is the manager (parent) of the current employee, which was imposed by:
START WITH manager_id IS NULL -- Root was the President
CONNECT BY PRIOR employee_id = manager_id -- From the root, connect to child
It is equivalent to saying use the parent’s employee_id
to connect to the child’s manager_id
.
This is a top-down approach.
A bottom-up approach would be to start with some employee and find all the managers up to the President.
START WITH employee_id = 206 -- Start with some employee
CONNECT BY PRIOR manager_id = employee_id; -- His manager_id will be used to connect to parent
Full Query
SELECT prior first_name emp_name,
prior job_id emp_job,
first_name mgr_name,
job_id mgr_job,
LEVEL
FROM hr.employees
START WITH employee_id = 206
CONNECT BY PRIOR manager_id = employee_id;
LEVEL
is a special keyword that returns the level of the hierarchy, with starting node as level 1.
Pruning Branches in Top-Down
To prune branches in a top-down hierarchy, place additional conditions in the CONNECT BY
clause:
SELECT first_name emp_name,
job_id emp_job,
PRIOR first_name mgr_name,
PRIOR job_id mgr_job
FROM hr.employees
START WITH manager_id IS NULL -- Start with the President
CONNECT BY PRIOR employee_id = manager_id
AND job_id != 'IT_PROG'; -- Exclude all child branches with IT_PROG job
Putting the condition in WHERE
does not have the effect of pruning branches, it merely filters the nodes after the hierarchy is built.
PostgreSQL WITH RECURSIVE
To be added
CONNECT BY
is Oracle-specific, but a similar hierarchical query can be achieved in PostgreSQL using WITH RECURSIVE
.