Window Functions
Window functions perform calculations across a set of table rows that are related to the current row. Unlike regular aggregate functions, window functions do not cause rows to become grouped into a single output row - the rows retain their separate identities.
Architecture Overview
graph TB
subgraph "Entry Points"
Query["execute_select<br/>[query.rs:155]"]
HasWindow["has_window_functions<br/>[window.rs:2522]"]
end
subgraph "Execution Coordinators"
ExecWindow["execute_select_with_window_functions<br/>[window.rs:115]"]
ExecPresorted["execute_select_with_window_functions_presorted<br/>[window.rs:135]"]
ExecPregrouped["execute_select_with_window_functions_pregrouped<br/>[window.rs:156]"]
ExecStreaming["execute_select_with_window_functions_streaming<br/>[window.rs:422]"]
ExecLazy["execute_select_with_window_functions_lazy_partition<br/>[window.rs:591]"]
end
subgraph "Core Computation"
ParseWF["parse_window_functions<br/>[window.rs:974]"]
ComputeWF["compute_window_function<br/>[window.rs:1216]"]
ComputePartition["compute_window_for_partition<br/>[window.rs:1423]"]
ComputeAggWF["compute_aggregate_window_function<br/>[window.rs:2111]"]
end
subgraph "Specialized Computations"
ComputeLeadLag["compute_lead_lag<br/>[window.rs:1541]"]
ComputeNtile["compute_ntile<br/>[window.rs:1585]"]
ComputeRank["compute_rank<br/>[window.rs:1639]"]
ComputePercentRank["compute_percent_rank<br/>[window.rs:1821]"]
ComputeCumeDist["compute_cume_dist<br/>[window.rs:1859]"]
end
subgraph "Support Functions"
PrecomputeOrderBy["precompute_order_by_values<br/>[window.rs:1967]"]
SortByOrderValues["sort_by_order_values<br/>[window.rs:2092]"]
ParseSelectList["parse_select_list_for_window<br/>[window.rs:748]"]
end
Query -->|"has window functions?"| HasWindow
HasWindow -->|"yes"| ExecWindow
ExecWindow --> ParseWF
ExecWindow --> ComputeWF
ExecPresorted --> ComputeWF
ExecPregrouped --> ComputeWF
ExecStreaming --> ComputePartition
ExecLazy --> ComputePartition
ComputeWF -->|"partition rows"| ComputePartition
ComputeWF -->|"aggregate functions"| ComputeAggWF
ComputePartition --> PrecomputeOrderBy
ComputePartition --> SortByOrderValues
ComputePartition --> ComputeLeadLag
ComputePartition --> ComputeNtile
ComputePartition --> ComputeRank
ComputeAggWF --> PrecomputeOrderBy
ComputeAggWF --> SortByOrderValues
ExecWindow --> ParseSelectList
ExecStreaming --> ParseSelectList
ExecLazy --> ParseSelectList
Execution Pipeline
graph TD
Start["Query with Window Function"] --> Detect["Detect Window Functions<br/>[window.rs:2522-2536]"]
Detect --> Parse["Parse Window Functions<br/>[window.rs:974-1031]<br/>Extract WindowFunctionInfo"]
Parse --> OptCheck{"Optimization<br/>Eligible?"}
OptCheck -->|"PARTITION BY + LIMIT<br/>+ indexed partition col"| LazyPartition["Lazy Partition Fetch<br/>[window.rs:591-745]<br/>Fetch partitions one-by-one<br/>Stop at LIMIT"]
OptCheck -->|"No PARTITION BY<br/>+ indexed ORDER BY col"| PreSorted["Pre-sorted Optimization<br/>[window.rs:135-151]<br/>Skip sorting"]
OptCheck -->|"PARTITION BY<br/>+ indexed partition col"| PreGrouped["Pre-grouped Optimization<br/>[window.rs:156-172]<br/>Skip hash grouping"]
OptCheck -->|"PARTITION BY + LIMIT"| Streaming["Streaming with LIMIT<br/>[window.rs:422-586]<br/>Process partitions until LIMIT"]
OptCheck -->|"Standard case"| Standard["Standard Execution<br/>[window.rs:115-131]"]
LazyPartition --> BuildResult["Build Result Rows<br/>[window.rs:352-417]"]
PreSorted --> Compute["Compute Window Values<br/>[window.rs:1216-1416]"]
PreGrouped --> Compute
Streaming --> BuildResult
Standard --> Compute
Compute --> Partition{"Has<br/>PARTITION BY?"}
Partition -->|"No"| SinglePartition["Single Partition<br/>[window.rs:1250-1293]<br/>All rows together"]
Partition -->|"Yes"| MultiPartition["Hash-based Partitioning<br/>[window.rs:1295-1354]<br/>Group by partition key"]
SinglePartition --> ComputePartition["compute_window_for_partition<br/>[window.rs:1423-1538]"]
MultiPartition -->|"≥10 partitions<br/>+ ≥1000 rows"| Parallel["Parallel Partition Processing<br/>[window.rs:1359-1390]<br/>Rayon parallel_iter"]
MultiPartition -->|"Small dataset"| Sequential["Sequential Processing<br/>[window.rs:1391-1415]"]
Parallel --> ComputePartition
Sequential --> ComputePartition
ComputePartition --> FuncType{"Function<br/>Type?"}
FuncType -->|"Aggregate<br/>(SUM, COUNT, etc.)"| AggCompute["compute_aggregate_window_function<br/>[window.rs:2111-2520]<br/>With frame bounds"]
FuncType -->|"LEAD/LAG"| LeadLag["compute_lead_lag<br/>[window.rs:1541-1582]"]
FuncType -->|"NTILE"| Ntile["compute_ntile<br/>[window.rs:1585-1636]"]
FuncType -->|"RANK/DENSE_RANK"| Rank["compute_rank<br/>[window.rs:1639-1693]"]
FuncType -->|"Other ranking"| Other["ROW_NUMBER, etc.<br/>[window.rs:1530-1532]"]
AggCompute --> BuildResult
LeadLag --> BuildResult
Ntile --> BuildResult
Rank --> BuildResult
Other --> BuildResult
BuildResult --> End["Return ExecutorMemoryResult<br/>or streaming result"]
Syntax
function_name([expression]) OVER (
[PARTITION BY partition_expression [, ...]]
[ORDER BY sort_expression [ASC | DESC] [, ...]]
)
- PARTITION BY: Divides rows into groups (partitions) that share the same values
- ORDER BY: Defines the order of rows within each partition
Available Window Functions
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential number to each row within a partition.
SELECT name, dept, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;
Result:
name | dept | salary | row_num
-------+-------------+--------+--------
Diana | Engineering | 80000 | 1
Frank | Engineering | 75000 | 2
Charlie| Engineering | 70000 | 3
Bob | Sales | 60000 | 4
Eve | Sales | 55000 | 5
Alice | Sales | 50000 | 6
RANK()
Assigns a rank to each row within a partition. Rows with equal values receive the same rank, with gaps in the sequence.
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;
If two employees have the same salary and are ranked 2, the next rank would be 4 (not 3).
DENSE_RANK()
Similar to RANK(), but without gaps in the ranking sequence.
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;
If two employees have the same salary and are ranked 2, the next rank would be 3.
NTILE(n)
Distributes rows into a specified number of buckets.
SELECT name, salary,
NTILE(3) OVER (ORDER BY salary DESC) as tertile
FROM employees;
Divides employees into 3 groups based on salary.
PERCENT_RANK()
Returns the relative rank of the current row as a percentage (0 to 1).
SELECT name, salary,
PERCENT_RANK() OVER (ORDER BY salary) as pct_rank
FROM employees;
CUME_DIST()
Returns the cumulative distribution of a value (fraction of rows with values <= current row).
SELECT name, salary,
CUME_DIST() OVER (ORDER BY salary) as cume_dist
FROM employees;
Value Functions
LAG(expression [, offset [, default]])
Accesses a value from a previous row within the partition.
-- Get previous row's salary
SELECT name, salary,
LAG(salary) OVER (ORDER BY salary) as prev_salary
FROM employees;
-- Get salary from 2 rows back, with default of 0
SELECT name, salary,
LAG(salary, 2, 0) OVER (ORDER BY salary) as prev2_salary
FROM employees;
LEAD(expression [, offset [, default]])
Accesses a value from a following row within the partition.
-- Get next row's salary
SELECT name, salary,
LEAD(salary) OVER (ORDER BY salary) as next_salary
FROM employees;
-- Get salary from 2 rows ahead, with default of 0
SELECT name, salary,
LEAD(salary, 2, 0) OVER (ORDER BY salary) as next2_salary
FROM employees;
FIRST_VALUE(expression)
Returns the first value within the partition.
SELECT name, dept, salary,
FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) as max_in_dept
FROM employees;
LAST_VALUE(expression)
Returns the last value within the current window frame.
SELECT name, dept, salary,
LAST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) as current_last
FROM employees;
NTH_VALUE(expression, n)
Returns the nth value within the partition.
SELECT name, salary,
NTH_VALUE(salary, 2) OVER (ORDER BY salary DESC) as second_highest
FROM employees;
Using PARTITION BY
PARTITION BY divides the result set into partitions, and window functions are applied to each partition separately.
-- Row numbers within each department
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees
ORDER BY dept, dept_rank;
Result:
name | dept | salary | dept_rank
-------+-------------+--------+----------
Diana | Engineering | 80000 | 1
Frank | Engineering | 75000 | 2
Charlie| Engineering | 70000 | 3
Bob | Sales | 60000 | 1
Eve | Sales | 55000 | 2
Alice | Sales | 50000 | 3
Aggregate Functions as Window Functions
Standard aggregate functions can also be used as window functions:
-- Running total
SELECT name, salary,
SUM(salary) OVER (ORDER BY salary) as running_total
FROM employees
ORDER BY salary;
Result:
name | salary | running_total
-------+--------+--------------
Alice | 50000 | 50000
Eve | 55000 | 105000
Bob | 60000 | 165000
Charlie| 70000 | 235000
Frank | 75000 | 310000
Diana | 80000 | 390000
Common Use Cases
Top N per Group
Find the highest paid employee in each department:
SELECT * FROM (
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rn
FROM employees
) ranked
WHERE rn = 1;
Running Totals
Calculate cumulative sales:
SELECT order_date, amount,
SUM(amount) OVER (ORDER BY order_date) as cumulative_sales
FROM orders;
Compare to Previous/Next
Calculate month-over-month change:
SELECT month, revenue,
revenue - LAG(revenue) OVER (ORDER BY month) as change
FROM monthly_sales;
Percentile Calculation
Find employees in the top 10% of salaries:
SELECT * FROM (
SELECT name, salary,
PERCENT_RANK() OVER (ORDER BY salary DESC) as pct
FROM employees
) ranked
WHERE pct <= 0.10;
Notes
- When ORDER BY is omitted, the entire partition is treated as a single group
- NULL values are handled according to SQL standards
- Window functions can only appear in SELECT and ORDER BY clauses
- Multiple window functions can be used in the same query