SQL Window Function
- Reference: PostgreSQL Tutorial
Table of contents
Sample Data
SQL
CREATE SCHEMA IF NOT EXISTS win;
SET search_path TO win;
CREATE TABLE product_groups (
group_id serial PRIMARY KEY,
group_name VARCHAR (255) NOT NULL
);
CREATE TABLE products (
product_id serial PRIMARY KEY,
product_name VARCHAR (255) NOT NULL,
price DECIMAL (11, 2),
group_id INT NOT NULL,
FOREIGN KEY (group_id) REFERENCES product_groups (group_id)
);
INSERT INTO product_groups (group_name)
VALUES
('Smartphone'),
('Laptop'),
('Tablet');
INSERT INTO products (product_name, group_id,price)
VALUES
('Microsoft Lumia', 1, 200),
('HTC One', 1, 400),
('Nexus', 1, 500),
('iPhone', 1, 900),
('HP Elite', 2, 1200),
('Lenovo Thinkpad', 2, 700),
('Sony VAIO', 2, 700),
('Dell Vostro', 2, 800),
('iPad', 3, 700),
('Kindle Fire', 3, 150),
('Samsung Galaxy Tab', 3, 200);
Additional View
CREATE VIEW product_and_groups AS
SELECT *
FROM products
JOIN product_groups USING (group_id);
Window Function vs Aggregate Function
A window refers to the set of rows that the function will operate on.
- Aggregate functions operate on a group of rows and return a single value.
- Window functions operate on a set of rows (window) and return value for each row in window.
With aggregate functions and group by:
SELECT group_name, AVG(price) AS avg_price
FROM product_and_groups
GROUP BY group_name
ORDER BY 2 DESC;
group_name | avg_price |
---|---|
Laptop | 850 |
Smartphone | 500 |
Tablet | 350 |
With window functions:
SELECT product_name,
group_name,
AVG(price) OVER (PARTITION BY group_name) AS avg_price
FROM product_and_groups
ORDER BY 3 DESC;
product_name | group_name | avg_price |
---|---|---|
HP Elite | Laptop | 850 |
Lenovo Thinkpad | Laptop | 850 |
Sony VAIO | Laptop | 850 |
Dell Vostro | Laptop | 850 |
Microsoft Lumia | Smartphone | 500 |
HTC One | Smartphone | 500 |
Nexus | Smartphone | 500 |
iPhone | Smartphone | 500 |
iPad | Tablet | 350 |
Kindle Fire | Tablet | 350 |
Samsung Galaxy Tab | Tablet | 350 |
Window Function Syntax
WINDOW_FUNC(args) OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression] [NULLS {FIRST | LAST}]
[frame_clause]
)
Window Functions
Called window functions because they perform a calculation while sliding a window row by row over the partition.
Any aggregate function or window function can be used in place of WINDOW_FUNC
.
- Aggregate functions
AVG
,SUM
,COUNT
,MIN
,MAX
ROW_NUMBER()
: Assigns a unique sequential integer to each row in each partitionRANK()
: Assigns a integer rank to each row in each partition- Ties are assigned the same rank, with the next rank skipped (1, 1, 3…)
DENSE_RANK()
: Similar toRANK()
, but no ranks are skipped (1, 1, 2…)FIRST_VALUE(column_expression)
: Returns the first evaluated value in the partitionLAST_VALUE(column_expression)
: Returns the last evaluated value in the partitionNTILE(bucket_count)
: Distributes rows into a specified number of buckets evenly (as much as possible)PERCENT_RANK()
: In terms of percent, rank the standing of each row in a partitionLAG(column_expression, offset, default)
: Accesses data from an earlier rowdefault
is used when the offset goes beyond the partition
LEAD(column_expression, offset, default)
: Accesses data from a later rowCUME_DIST
,LAG
,LEAD
,NTH_VALUE
Window and Partition
The window and partition is defined by the parentheses after the OVER
clause.
Partitioning
PARTITION BY
describes how the result set should be divided into groups.
You can partition by multiple columns as well:
SELECT product_name,
group_name,
price,
ROW_NUMBER() OVER (PARTITION BY group_name, price) AS row_num
FROM product_and_groups;
PARTITION BY
is optional. Omitting it will treat all result rows as a single group:
SELECT product_name,
group_name,
AVG(price) OVER () AS avg_price
FROM product_and_groups
ORDER BY 3 DESC;
-- The third column will be
SELECT AVG(price)
FROM product_and_groups;
WINDOW Clause
There is an alternate syntax to define the window:
SELECT product_name,
group_name,
AVG(price) OVER w AS avg_price
FROM product_and_groups
WINDOW w AS (PARTITION BY group_name)
ORDER BY 3 DESC;
Useful when reusing the same window definition multiple times.
Ordering within Partition
ORDER BY
specifies how the rows in each partition should be ordered. Additional NULLS FIRST
or NULLS LAST
(default) controls whether NULL
values should be ordered first or last.
The effect of ORDER BY
makes more sense with window functions like ROW_NUMBER()
:
SELECT product_name,
group_name,
price,
ROW_NUMBER() OVER w AS row_num
FROM product_and_groups
WINDOW w AS (PARTITION BY group_name ORDER BY price);
product_name | group_name | price | row_num |
---|---|---|---|
Sony VAIO | Laptop | 700.00 | 1 |
Lenovo Thinkpad | Laptop | 700.00 | 2 |
Dell Vostro | Laptop | 800.00 | 3 |
HP Elite | Laptop | 1200.00 | 4 |
Microsoft Lumia | Smartphone | 200.00 | 1 |
HTC One | Smartphone | 400.00 | 2 |
Nexus | Smartphone | 500.00 | 3 |
iPhone | Smartphone | 900.00 | 4 |
Kindle Fire | Tablet | 150.00 | 1 |
Samsung Galaxy Tab | Tablet | 200.00 | 2 |
iPad | Tablet | 700.00 | 3 |
Remember that you can use ORDER BY
without PARTITION BY
:
SELECT product_name,
group_name,
price,
RANK() OVER w AS price_rank
FROM product_and_groups
WINDOW w AS (ORDER BY price DESC);
product_name | group_name | price | price_rank |
---|---|---|---|
HP Elite | Laptop | 1200.00 | 1 |
iPhone | Smartphone | 900.00 | 2 |
Dell Vostro | Laptop | 800.00 | 3 |
Lenovo Thinkpad | Laptop | 700.00 | 4 |
Sony VAIO | Laptop | 700.00 | 4 |
iPad | Tablet | 700.00 | 4 |
Nexus | Smartphone | 500.00 | 7 |
HTC One | Smartphone | 400.00 | 8 |
Samsung Galaxy Tab | Tablet | 200.00 | 9 |
Microsoft Lumia | Smartphone | 200.00 | 9 |
Kindle Fire | Tablet | 150.00 | 11 |
Frame Clause
You can refine a sliding window only if the partition has an order imposed by ORDER BY
.
You can further refine the window with a frame clause:
ROWS BETWEEN start AND end -- Uses row numbers for calculating window size
-- or
RANGE BETWEEN start AND end -- Uses column values for calculating window size
Special keywords for start
and end
:
CURRENT ROW
:- Current row number
- Current row value
UNBOUNDED PRECEDING
: Start of the partitionUNBOUNDED FOLLOWING
: End of the partitionn PRECEDING
n
row numbers before the current rown
less than the current row value
n FOLLOWING
n
row numbers after the current rown
greater than the current row value
- Time-range frame clauses for date and time columns:
'n' MONTH
,'n' DAY
,'n' HOUR
,'n' MINUTE
,'n' SECOND
+PRECEDING
orFOLLOWING
For example, if you wanted to calculate the running total of prices (have the window grow as it goes down the rows in the partition):
SELECT product_name,
group_name,
price,
SUM(price) OVER (
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM product_and_groups;
product_name | group_name | price | running_total |
---|---|---|---|
Kindle Fire | Tablet | 150.00 | 150 |
Microsoft Lumia | Smartphone | 200.00 | 350 |
Samsung Galaxy Tab | Tablet | 200.00 | 550 |
HTC One | Smartphone | 400.00 | 950 |
Nexus | Smartphone | 500.00 | 1450 |
iPad | Tablet | 700.00 | 2150 |
Lenovo Thinkpad | Laptop | 700.00 | 2850 |
Sony VAIO | Laptop | 700.00 | 3550 |
Dell Vostro | Laptop | 800.00 | 4350 |
iPhone | Smartphone | 900.00 | 5250 |
HP Elite | Laptop | 1200.00 | 6450 |
Or a window containing the previous, current, and next row:
SELECT product_name,
group_name,
price,
SUM(price) OVER (
ORDER BY price
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS windowed_sum
FROM product_and_groups;
product_name | group_name | price | windowed_sum |
---|---|---|---|
Kindle Fire | Tablet | 150.00 | 350 |
Microsoft Lumia | Smartphone | 200.00 | 550 |
Samsung Galaxy Tab | Tablet | 200.00 | 800 |
HTC One | Smartphone | 400.00 | 1100 |
Nexus | Smartphone | 500.00 | 1600 |
iPad | Tablet | 700.00 | 1900 |
Lenovo Thinkpad | Laptop | 700.00 | 2100 |
Sony VAIO | Laptop | 700.00 | 2200 |
Dell Vostro | Laptop | 800.00 | 2400 |
iPhone | Smartphone | 900.00 | 2900 |
HP Elite | Laptop | 1200.00 | 2100 |