SQL Window Function

Table of contents
  1. Window Function vs Aggregate Function
  2. Window Function Syntax
    1. Window Functions
    2. Window and Partition
      1. Partitioning
      2. WINDOW Clause
      3. Ordering within Partition
      4. Frame Clause

Sample Data

SQL
CREATE SCHEMA IF NOT EXISTS win;
SET search_path TO win;

CREATE TABLE product_groups (
  group_id serial PRIMARY KEY,
  group_name VARCHAR (255) NOT NULL
);

CREATE TABLE products (
  product_id serial PRIMARY KEY,
  product_name VARCHAR (255) NOT NULL,
  price DECIMAL (11, 2),
  group_id INT NOT NULL,
  FOREIGN KEY (group_id) REFERENCES product_groups (group_id)
);

INSERT INTO product_groups (group_name)
VALUES
  ('Smartphone'),
  ('Laptop'),
  ('Tablet');

INSERT INTO products (product_name, group_id,price)
VALUES
  ('Microsoft Lumia', 1, 200),
  ('HTC One', 1, 400),
  ('Nexus', 1, 500),
  ('iPhone', 1, 900),
  ('HP Elite', 2, 1200),
  ('Lenovo Thinkpad', 2, 700),
  ('Sony VAIO', 2, 700),
  ('Dell Vostro', 2, 800),
  ('iPad', 3, 700),
  ('Kindle Fire', 3, 150),
  ('Samsung Galaxy Tab', 3, 200);
Additional View
CREATE VIEW product_and_groups AS
SELECT *
FROM products
     JOIN product_groups USING (group_id);

Window Function vs Aggregate Function

A window refers to the set of rows that the function will operate on.

  • Aggregate functions operate on a group of rows and return a single value.
  • Window functions operate on a set of rows (window) and return value for each row in window.

With aggregate functions and group by:

SELECT group_name, AVG(price) AS avg_price
FROM product_and_groups
GROUP BY group_name
ORDER BY 2 DESC;
group_nameavg_price
Laptop850
Smartphone500
Tablet350

With window functions:

SELECT product_name,
       group_name,
       AVG(price) OVER (PARTITION BY group_name) AS avg_price
FROM product_and_groups
ORDER BY 3 DESC;
product_namegroup_nameavg_price
HP EliteLaptop850
Lenovo ThinkpadLaptop850
Sony VAIOLaptop850
Dell VostroLaptop850
Microsoft LumiaSmartphone500
HTC OneSmartphone500
NexusSmartphone500
iPhoneSmartphone500
iPadTablet350
Kindle FireTablet350
Samsung Galaxy TabTablet350

Window Function Syntax

WINDOW_FUNC(args) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY sort_expression] [NULLS {FIRST | LAST}]
  [frame_clause]
)

Window Functions

Called window functions because they perform a calculation while sliding a window row by row over the partition.

Any aggregate function or window function can be used in place of WINDOW_FUNC.

  • Aggregate functions
    • AVG, SUM, COUNT, MIN, MAX
  • ROW_NUMBER(): Assigns a unique sequential integer to each row in each partition
  • RANK(): Assigns a integer rank to each row in each partition
    • Ties are assigned the same rank, with the next rank skipped (1, 1, 3…)
  • DENSE_RANK(): Similar to RANK(), but no ranks are skipped (1, 1, 2…)
  • FIRST_VALUE(column_expression): Returns the first evaluated value in the partition
  • LAST_VALUE(column_expression): Returns the last evaluated value in the partition
  • NTILE(bucket_count): Distributes rows into a specified number of buckets evenly (as much as possible)
  • PERCENT_RANK(): In terms of percent, rank the standing of each row in a partition
  • LAG(column_expression, offset, default): Accesses data from an earlier row
    • default is used when the offset goes beyond the partition
  • LEAD(column_expression, offset, default): Accesses data from a later row
  • CUME_DIST, LAG, LEAD, NTH_VALUE

Window and Partition

The window and partition is defined by the parentheses after the OVER clause.

Partitioning

PARTITION BY describes how the result set should be divided into groups.

You can partition by multiple columns as well:

SELECT product_name,
       group_name,
       price,
       ROW_NUMBER() OVER (PARTITION BY group_name, price) AS row_num
FROM product_and_groups;

PARTITION BY is optional. Omitting it will treat all result rows as a single group:

SELECT product_name,
       group_name,
       AVG(price) OVER () AS avg_price
FROM product_and_groups
ORDER BY 3 DESC;

-- The third column will be
SELECT AVG(price)
FROM product_and_groups;

WINDOW Clause

There is an alternate syntax to define the window:

SELECT product_name,
       group_name,
       AVG(price) OVER w AS avg_price
FROM product_and_groups
WINDOW w AS (PARTITION BY group_name)
ORDER BY 3 DESC;

Useful when reusing the same window definition multiple times.

Ordering within Partition

ORDER BY specifies how the rows in each partition should be ordered. Additional NULLS FIRST or NULLS LAST (default) controls whether NULL values should be ordered first or last.

The effect of ORDER BY makes more sense with window functions like ROW_NUMBER():

SELECT product_name,
       group_name,
       price,
       ROW_NUMBER() OVER w AS row_num
FROM product_and_groups
WINDOW w AS (PARTITION BY group_name ORDER BY price);
product_namegroup_namepricerow_num
Sony VAIOLaptop700.001
Lenovo ThinkpadLaptop700.002
Dell VostroLaptop800.003
HP EliteLaptop1200.004
Microsoft LumiaSmartphone200.001
HTC OneSmartphone400.002
NexusSmartphone500.003
iPhoneSmartphone900.004
Kindle FireTablet150.001
Samsung Galaxy TabTablet200.002
iPadTablet700.003

Remember that you can use ORDER BY without PARTITION BY:

SELECT product_name,
       group_name,
       price,
       RANK() OVER w AS price_rank
FROM product_and_groups
WINDOW w AS (ORDER BY price DESC);
product_namegroup_namepriceprice_rank
HP EliteLaptop1200.001
iPhoneSmartphone900.002
Dell VostroLaptop800.003
Lenovo ThinkpadLaptop700.004
Sony VAIOLaptop700.004
iPadTablet700.004
NexusSmartphone500.007
HTC OneSmartphone400.008
Samsung Galaxy TabTablet200.009
Microsoft LumiaSmartphone200.009
Kindle FireTablet150.0011

Frame Clause

You can refine a sliding window only if the partition has an order imposed by ORDER BY.

You can further refine the window with a frame clause:

ROWS BETWEEN start AND end  -- Uses row numbers for calculating window size
-- or
RANGE BETWEEN start AND end  -- Uses column values for calculating window size

Special keywords for start and end:

  • CURRENT ROW:
    • Current row number
    • Current row value
  • UNBOUNDED PRECEDING: Start of the partition
  • UNBOUNDED FOLLOWING: End of the partition
  • n PRECEDING
    • n row numbers before the current row
    • n less than the current row value
  • n FOLLOWING
    • n row numbers after the current row
    • n greater than the current row value
  • Time-range frame clauses for date and time columns:
    • 'n' MONTH, 'n' DAY, 'n' HOUR, 'n' MINUTE, 'n' SECOND + PRECEDING or FOLLOWING

For example, if you wanted to calculate the running total of prices (have the window grow as it goes down the rows in the partition):

SELECT product_name,
       group_name,
       price,
       SUM(price) OVER (
           ORDER BY price
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM product_and_groups;
product_namegroup_namepricerunning_total
Kindle FireTablet150.00150
Microsoft LumiaSmartphone200.00350
Samsung Galaxy TabTablet200.00550
HTC OneSmartphone400.00950
NexusSmartphone500.001450
iPadTablet700.002150
Lenovo ThinkpadLaptop700.002850
Sony VAIOLaptop700.003550
Dell VostroLaptop800.004350
iPhoneSmartphone900.005250
HP EliteLaptop1200.006450

Or a window containing the previous, current, and next row:

SELECT product_name,
       group_name,
       price,
       SUM(price) OVER (
           ORDER BY price
           ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS windowed_sum
FROM product_and_groups;
product_namegroup_namepricewindowed_sum
Kindle FireTablet150.00350
Microsoft LumiaSmartphone200.00550
Samsung Galaxy TabTablet200.00800
HTC OneSmartphone400.001100
NexusSmartphone500.001600
iPadTablet700.001900
Lenovo ThinkpadLaptop700.002100
Sony VAIOLaptop700.002200
Dell VostroLaptop800.002400
iPhoneSmartphone900.002900
HP EliteLaptop1200.002100