窗口函式:PostgreSQL 強大的分析型查詢能力 | PostgreSQL

2026/06/23
窗口函式:PostgreSQL 強大的分析型查詢能力 | PostgreSQL

窗口函式(Window Function) 是 SQL 中一類特殊的函式,能夠在不折疊結果集的情況下,對一組相關行執行計算。與 GROUP BY 不同,窗口函式保留每一行的獨立性,同時可以存取「窗口」內的其他行資料——這使得排名、累計、移動平均、同比環比等分析型查詢變得優雅而高效。

窗口函式在查詢流程中的位置

窗口函式在 SQL 執行順序中,位於 WHEREGROUP BYHAVING 之後,ORDER BYLIMIT 之前:

FROM / JOIN    ← 確定基礎資料集
     │
   WHERE       ← 過濾行(窗口函式不可用)
     │
  GROUP BY     ← 分組聚合
     │
   HAVING      ← 過濾分組(窗口函式不可用)
     │
SELECT (含 Window)  ← 窗口函式在此計算 ★
     │
  ORDER BY     ← 排序最終結果
     │
   LIMIT       ← 截取結果

關鍵推論:

  • 窗口函式無法在 WHERE 條件中直接使用(需要 CTE 或子查詢包裝)
  • 若查詢有 GROUP BY,窗口函式操作的是聚合後的行
  • 同一 SELECT 中可以同時使用多個不同定義的窗口函式

OVER 子句基本結構

所有窗口函式都必須搭配 OVER 子句使用:

函式名稱() OVER (
    [PARTITION BY 分區欄位]
    [ORDER BY 排序欄位]
    [frame_clause]
)
  • PARTITION BY:將資料分成獨立的分區,窗口函式在每個分區內獨立計算
  • ORDER BY:決定分區內的排序,影響累計計算與排名
  • frame_clause:定義窗口框架範圍(詳見 Frame 子句章節)

PARTITION BY vs GROUP BY

這是理解窗口函式的核心概念。

GROUP BY:折疊資料,每組只剩一行

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- 結果:
-- department  | avg_salary
-- ------------+-----------
-- Engineering |   85000
-- Sales       |   60000

PARTITION BY:不折疊資料,保留所有行

SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- 結果:
-- name  | department  | salary | dept_avg
-- ------+-------------+--------+---------
-- Alice | Engineering |  90000 |   85000
-- Bob   | Engineering |  80000 |   85000
-- Carol | Sales       |  65000 |   60000
-- Dave  | Sales       |  55000 |   60000
面向GROUP BYPARTITION BY
結果行數每組壓縮為一行保留原始行數
其他欄位只能選聚合欄位可選任意欄位
搭配聚合函式窗口函式(OVER)
用途分組統計保留明細的分組計算

四種排名函式

SELECT
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS row_number,
    RANK()       OVER (ORDER BY score DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank,
    NTILE(3)     OVER (ORDER BY score DESC) AS ntile_3
FROM sales_scores;

-- 結果(含並列值 85):
-- name  | score | row_number | rank | dense_rank | ntile_3
-- ------+-------+------------+------+------------+--------
-- Alice |    90 |          1 |    1 |          1 |      1
-- Bob   |    85 |          2 |    2 |          2 |      1
-- Carol |    85 |          3 |    2 |          2 |      2
-- Dave  |    80 |          4 |    4 |          3 |      2
-- Eve   |    75 |          5 |    5 |          4 |      3

並列值(Tie)的處理差異:

函式並列時行為後續排名典型用途
ROW_NUMBER()任意給不同編號連續分頁、去重、Top-N
RANK()並列相同排名跳號(2,2,4)真實比賽排名
DENSE_RANK()並列相同排名不跳號(2,2,3)緊密排名
NTILE(n)均分至 n 組N/A分位分析

搭配 PARTITION BY 的分組排名

-- 每個部門內分別排名
SELECT
    name, department, salary,
    RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS dept_rank
FROM employees;

-- Top-N per Group:取每個部門薪資最高的員工
WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;

值存取函式:LAG 與 LEAD

LAGLEAD 允許在窗口內存取相對位置的行值,是計算同比、環比的核心工具。

-- LAG(value, offset, default):存取前 N 行的值
-- LEAD(value, offset, default):存取後 N 行的值

SELECT
    month,
    revenue,
    LAG(revenue, 1)  OVER (ORDER BY month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (ORDER BY month))::NUMERIC
        / NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100,
        2
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY month;

-- 結果:
-- month   | revenue | prev_month | mom_change | mom_growth_pct
-- --------+---------+------------+------------+---------------
-- 2024-01 |  100000 |      NULL  |      NULL  |          NULL
-- 2024-02 |  120000 |    100000  |     20000  |         20.00
-- 2024-03 |  115000 |    120000  |     -5000  |         -4.17
-- 搭配 PARTITION BY:各產品線內分別計算環比
SELECT
    product_line, month, revenue,
    LAG(revenue) OVER (
        PARTITION BY product_line ORDER BY month
    ) AS prev_month,
    revenue - LAG(revenue) OVER (
        PARTITION BY product_line ORDER BY month
    ) AS change
FROM product_revenue;

FIRST_VALUE、LAST_VALUE 與 NTH_VALUE

SELECT
    name, department, salary,
    -- 部門內薪資最高者
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS highest_paid,

    -- 部門內薪資最低者(注意:需明確 Frame!)
    LAST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS lowest_paid,

    -- 部門內薪資第二高者
    NTH_VALUE(name, 2) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_highest
FROM employees;

LAST_VALUE 的常見陷阱:

-- 錯誤:預設 Frame 只到 CURRENT ROW,LAST_VALUE 永遠返回當前行的值
SELECT name, salary,
    LAST_VALUE(name) OVER (ORDER BY salary DESC) AS wrong_result
FROM employees;
-- wrong_result 永遠等於 name 本身

-- 正確:明確指定 Frame 範圍到分區末尾
SELECT name, salary,
    LAST_VALUE(name) OVER (
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS correct_result
FROM employees;

聚合窗口函式

標準聚合函式搭配 OVER 子句後,成為強大的分析工具。

累計計算(Running Total)

SELECT
    order_date,
    daily_revenue,
    -- 累計銷售額
    SUM(daily_revenue) OVER (ORDER BY order_date) AS running_total,
    -- 累計平均
    AVG(daily_revenue) OVER (ORDER BY order_date) AS running_avg,
    -- 累計記錄數
    COUNT(*) OVER (ORDER BY order_date) AS running_count
FROM daily_sales;

滾動計算(Rolling Window)

-- 7 日移動平均
SELECT
    sale_date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d,

    -- 3 日移動總計
    SUM(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_sum_3d
FROM daily_sales
ORDER BY sale_date;

分區內統計

SELECT
    name, department, salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg,
    COUNT(*)    OVER (PARTITION BY department) AS dept_count,
    -- 個人薪資佔部門總薪資比例
    ROUND(
        salary::NUMERIC / SUM(salary) OVER (PARTITION BY department) * 100, 2
    ) AS pct_of_dept
FROM employees;

Window Frame 三種模式

Frame 子句精確控制窗口計算範圍:

{ ROWS | RANGE | GROUPS } BETWEEN frame_start AND frame_end

-- frame_start / frame_end 可選值:
--   UNBOUNDED PRECEDING  → 分區第一行
--   N PRECEDING          → 當前行前 N 行
--   CURRENT ROW          → 當前行
--   N FOLLOWING          → 當前行後 N 行
--   UNBOUNDED FOLLOWING  → 分區最後一行

ROWS vs RANGE 的差異

-- 測試資料
CREATE TABLE test_frame (day INT, val INT);
INSERT INTO test_frame VALUES (1,10),(2,20),(2,30),(3,40);

-- ROWS:按實體行計算
SELECT day, val,
    SUM(val) OVER (ORDER BY day
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS rows_sum
FROM test_frame;
-- day=2, val=20: rows_sum = 10+20 = 30
-- day=2, val=30: rows_sum = 10+20+30 = 60(兩行分別計算)

-- RANGE:相同 ORDER BY 值視為同組
SELECT day, val,
    SUM(val) OVER (ORDER BY day
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS range_sum
FROM test_frame;
-- day=2, val=20: range_sum = 10+20+30 = 60(day=2 的所有行一起)
-- day=2, val=30: range_sum = 10+20+30 = 60(結果相同)

GROUPS 模式(PG11+)

以 ORDER BY 值的不同群組為單位計算範圍:

SELECT event_date, value,
    SUM(value) OVER (
        ORDER BY event_date
        GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) AS groups_sum
FROM grouped_data
ORDER BY event_date, value;

-- event_date | value | groups_sum
-- 2024-01-01 |    10 |        30  ← 僅當前群組(01-01):10+20
-- 2024-01-01 |    20 |        30
-- 2024-01-02 |    30 |        60  ← 前 1 群組(01-01) + 當前(01-02)
-- 2024-01-03 |    40 |       120  ← 前 1 群組(01-02) + 當前(01-03)
-- 2024-01-03 |    50 |       120
模式計算單位適用場景
ROWS實體行移動平均、固定視窗
RANGEORDER BY 值的邏輯範圍相同值保持一致性
GROUPSORDER BY 值的分組(PG11+)按週期滾動計算

常用 Frame 模式速查

-- 累計(從頭到當前)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 整個分區
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

-- 移動視窗(前 N 到當前)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  -- 7 日視窗

-- 中心移動平均
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING  -- 7 日視窗(以當前為中心)

-- 反向累計
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

WINDOW 命名子句

多個窗口函式使用相同定義時,可命名避免重複:

SELECT
    name, department, salary,
    ROW_NUMBER() OVER dept_window,
    RANK()       OVER dept_window,
    DENSE_RANK() OVER dept_window,
    SUM(salary)  OVER dept_window,
    AVG(salary)  OVER dept_window
FROM employees
WINDOW dept_window AS (
    PARTITION BY department
    ORDER BY salary DESC
);

多個命名窗口與繼承

SELECT
    sale_date, product_id, revenue,
    SUM(revenue) OVER product_window AS product_total,
    SUM(revenue) OVER global_window  AS global_total,
    -- 繼承並擴充(添加 Frame)
    AVG(revenue) OVER (product_window ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
        AS product_7d_avg
FROM daily_product_sales
WINDOW
    product_window AS (PARTITION BY product_id ORDER BY sale_date),
    global_window  AS (ORDER BY sale_date);

FILTER 搭配窗口函式

FILTER 子句允許在聚合計算前過濾行:

SELECT
    order_date, status, amount,
    -- 所有訂單的累計金額
    SUM(amount) OVER (ORDER BY order_date) AS total_running,
    -- 僅已完成訂單的累計金額
    SUM(amount) FILTER (WHERE status = 'completed')
        OVER (ORDER BY order_date) AS completed_running,
    -- 各狀態的訂單數量
    COUNT(*) FILTER (WHERE status = 'completed')
        OVER (ORDER BY order_date) AS completed_count,
    COUNT(*) FILTER (WHERE status = 'cancelled')
        OVER (ORDER BY order_date) AS cancelled_count
FROM orders
ORDER BY order_date;

EXCLUDE 子句(PG11+)

從 Frame 中排除特定行:

SELECT name, score,
    -- 計算其他人的平均分(排除自身)
    AVG(score) OVER (
        ORDER BY score
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE CURRENT ROW
    ) AS others_avg
FROM exam_results;

-- EXCLUDE 可選值:
--   EXCLUDE NO OTHERS    ← 預設,不排除
--   EXCLUDE CURRENT ROW  ← 排除當前行
--   EXCLUDE GROUP        ← 排除相同 ORDER BY 值的所有行
--   EXCLUDE TIES         ← 排除相同值的其他行(保留當前行)

實戰場景

百分位與分布分析

SELECT
    name, score,
    -- 百分位排名(0 到 1)
    PERCENT_RANK() OVER (ORDER BY score) AS percent_rank,
    -- 累積分布
    CUME_DIST()    OVER (ORDER BY score) AS cume_dist,
    -- 十分位
    NTILE(10)      OVER (ORDER BY score) AS decile
FROM exam_results;

同比計算(Year-over-Year)

SELECT
    year, month, revenue,
    LAG(revenue, 12) OVER (ORDER BY year, month) AS same_month_last_year,
    ROUND(
        (revenue - LAG(revenue, 12) OVER (ORDER BY year, month))::NUMERIC
        / NULLIF(LAG(revenue, 12) OVER (ORDER BY year, month), 0) * 100,
        2
    ) AS yoy_growth_pct
FROM monthly_revenue
ORDER BY year, month;

連續登入天數(Gap and Island)

WITH login_gaps AS (
    SELECT
        user_id, login_date,
        login_date - (ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        ))::INT AS group_key
    FROM user_logins
),
streaks AS (
    SELECT user_id, group_key,
        COUNT(*) AS streak_length,
        MIN(login_date) AS streak_start,
        MAX(login_date) AS streak_end
    FROM login_gaps
    GROUP BY user_id, group_key
)
SELECT user_id, MAX(streak_length) AS max_consecutive_days
FROM streaks
GROUP BY user_id
ORDER BY max_consecutive_days DESC;

庫存滾動計算

SELECT
    product_id, transaction_date, transaction_type, quantity,
    SUM(
        CASE
            WHEN transaction_type = 'IN'  THEN  quantity
            WHEN transaction_type = 'OUT' THEN -quantity
        END
    ) OVER (
        PARTITION BY product_id
        ORDER BY transaction_date, transaction_id
    ) AS running_stock
FROM inventory_transactions
ORDER BY product_id, transaction_date;

效能考量

排序成本

窗口函式的主要效能瓶頸是排序。每個不同的 PARTITION BY + ORDER BY 組合都需要獨立排序。

-- 低效:三個不同的排序操作
SELECT
    SUM(amount) OVER (PARTITION BY dept ORDER BY date)   AS s1,
    AVG(amount) OVER (PARTITION BY dept ORDER BY date)   AS s2,
    SUM(amount) OVER (PARTITION BY region ORDER BY date) AS s3
FROM sales;

-- 較佳:使用 WINDOW 命名共享排序
SELECT
    SUM(amount) OVER w1 AS s1,
    AVG(amount) OVER w1 AS s2,  -- 與 s1 共享排序
    SUM(amount) OVER w2 AS s3
FROM sales
WINDOW
    w1 AS (PARTITION BY dept ORDER BY date),
    w2 AS (PARTITION BY region ORDER BY date);

work_mem 與排序溢出

-- 窗口函式排序依賴 work_mem
SHOW work_mem;  -- 預設通常 4MB

-- 大資料量時臨時增大
SET work_mem = '256MB';

-- 在 EXPLAIN 中留意排序節點
EXPLAIN (ANALYZE) SELECT SUM(amount) OVER (PARTITION BY dept ORDER BY date)
FROM large_table;
-- 留意是否出現 Sort Method: external merge(溢出到磁碟)

版本演進

版本新增功能
PG8.4窗口函式基礎支援(ROW_NUMBER、RANK、DENSE_RANK、NTILE、LAG、LEAD)
PG9.0FIRST_VALUE、LAST_VALUE、NTH_VALUE
PG11GROUPS Frame 模式、EXCLUDE 子句
PG14改進的窗口函式排序效能

總結

窗口函式 是 PostgreSQL 分析型查詢的利器:

  • PARTITION BY 將資料分區而不折疊,保留每行明細
  • 四種排名函式(ROW_NUMBER / RANK / DENSE_RANK / NTILE)處理不同的排名需求
  • LAG / LEAD 是同比環比計算的核心工具
  • 聚合 + OVER 實現累計、滾動、分區統計
  • Frame 子句 精確控制計算範圍,ROWS / RANGE / GROUPS 各有適用場景
  • LAST_VALUE 需明確 Frame,預設只到 CURRENT ROW 是最常見的陷阱
  • WINDOW 命名子句 避免重複定義,共享排序提升效能
  • 效能關注排序成本與 work_mem 設定

下一篇,我們將深入探討 聚合函式與分組——PostgreSQL 的資料匯總與統計能力。

BenZ Software Developer

熱愛技術的軟體開發者,在這裡分享程式開發經驗與學習筆記。