CTE 與子查詢:PostgreSQL 的查詢組合與遞迴能力 | PostgreSQL

2026/06/25
CTE 與子查詢:PostgreSQL 的查詢組合與遞迴能力 | PostgreSQL

CTE(Common Table Expression) 透過 WITH 子句定義命名暫時結果集,讓複雜查詢更具可讀性。PostgreSQL 在 CTE 上提供了 Materialized / Inline 智慧型決策(PG12+)、強大的 遞迴 CTE 支援,以及獨特的 可寫 CTE(WITH … INSERT/UPDATE/DELETE)能力,搭配子查詢的 EXISTS / IN / ANY 語義,是組合複雜查詢邏輯的利器。

CTE 執行模型:Materialized vs Inline

PostgreSQL 執行 CTE 時有兩種策略:

策略行為優勢劣勢
MaterializedCTE 結果實際計算並快取多次引用只算一次無法利用外層 WHERE 下推
InlineCTE 展開為子查詢規劃器可整合最佳化、利用索引多次引用可能重複計算

PG12 之前:所有 CTE 強制 Materialized,形成「優化障壁(Optimization Fence)」。

PG12 起:規劃器依據以下規則自動判斷:

  • 只被引用一次且無副作用 → 預設 Inline
  • 含副作用(INSERT/UPDATE/DELETE)或被引用多次 → 預設 Materialized
  • 可用 MATERIALIZED / NOT MATERIALIZED 手動覆蓋
-- 強制 Materialized:CTE 結果只計算一次
WITH expensive_cte AS MATERIALIZED (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM   orders
    GROUP  BY customer_id
    HAVING SUM(amount) > 1000
)
SELECT c.name, e.total_spent
FROM   expensive_cte e
JOIN   customers c USING (customer_id)
WHERE  c.region = 'Asia';

-- 強制 NOT MATERIALIZED:讓外層條件下推至 CTE 內
WITH customer_orders AS NOT MATERIALIZED (
    SELECT customer_id, order_date, amount
    FROM   orders
)
SELECT *
FROM   customer_orders
WHERE  customer_id = 42     -- 此條件可下推,利用索引
  AND  order_date >= '2025-01-01';
-- PG11 及之前的效能問題
WITH all_users AS (
    SELECT * FROM users  -- 強制全表掃描!
)
SELECT * FROM all_users WHERE id = 42;
-- PG12+ 自動 Inline,可利用主鍵索引

多 CTE 串連

後面的 CTE 可引用前面的 CTE,形成步驟化的資料處理管線:

WITH
-- CTE 1:計算各部門平均薪資
dept_avg AS (
    SELECT department_id, AVG(salary)::NUMERIC(10,2) AS avg_salary
    FROM   employees
    GROUP  BY department_id
),
-- CTE 2:找出高於部門平均的員工
above_avg AS (
    SELECT e.employee_id, e.name, e.salary, d.avg_salary
    FROM   employees e
    JOIN   dept_avg d USING (department_id)
    WHERE  e.salary > d.avg_salary
),
-- CTE 3:附加部門名稱
final_result AS (
    SELECT a.name, a.salary, a.avg_salary, dep.department_name
    FROM   above_avg a
    JOIN   departments dep USING (department_id)
)
SELECT * FROM final_result ORDER BY salary DESC;

遞迴 CTE 的執行機制

遞迴 CTE 使用 WITH RECURSIVE 語法,透過反覆迭代建構結果集:

WITH RECURSIVE cte AS (
    <非遞迴基底部分>  ← 只執行一次,產生初始結果集
    UNION ALL
    <遞迴部分>        ← 反覆執行,直到無新資料產生
)

執行流程:

Step 1:執行基底部分 → 結果放入 Working Table
         │
Step 2:Working Table 非空?
  YES → 以 Working Table 為輸入執行遞迴部分
         │
         ├─ 新結果暫存至 Intermediate Table
         ├─ 將 Intermediate Table 附加到最終結果集
         └─ 用 Intermediate Table 取代 Working Table → 回到 Step 2

  NO  → 結束迭代,返回完整結果集

重要特性

  • 遞迴部分只能讀取 Working Table,不能引用最終累積結果集
  • UNION ALL 允許重複列;UNION 每步去除重複,效能較差
  • 必須有終止條件,否則無窮迴圈

遞迴 CTE 實戰:組織樹

-- 資料表
CREATE TABLE employees (
    id         INT PRIMARY KEY,
    name       VARCHAR(100),
    manager_id INT REFERENCES employees(id)
);

INSERT INTO employees VALUES
(1, 'Alice',   NULL),  -- CEO
(2, 'Bob',     1),
(3, 'Charlie', 1),
(4, 'David',   2),
(5, 'Eve',     2),
(6, 'Frank',   3);

-- 從 CEO 向下展開整棵組織樹
WITH RECURSIVE org_tree AS (
    -- 基底:根節點
    SELECT
        id, name, manager_id,
        0              AS depth,
        ARRAY[id]      AS path,
        name::TEXT      AS full_path
    FROM   employees
    WHERE  manager_id IS NULL

    UNION ALL

    -- 遞迴:下一層員工
    SELECT
        e.id, e.name, e.manager_id,
        ot.depth + 1,
        ot.path || e.id,
        ot.full_path || ' > ' || e.name
    FROM   employees e
    JOIN   org_tree ot ON e.manager_id = ot.id
    WHERE  e.id <> ALL(ot.path)  -- 防止循環引用
)
SELECT
    repeat('  ', depth) || name AS indented_name,
    depth, full_path
FROM   org_tree
ORDER  BY path;

-- 結果:
-- indented_name      depth  full_path
-- Alice              0      Alice
--   Bob              1      Alice > Bob
--     David          2      Alice > Bob > David
--     Eve            2      Alice > Bob > Eve
--   Charlie          1      Alice > Charlie
--     Frank          2      Alice > Charlie > Frank

遞迴 CTE 實戰:BOM 展開

-- 物料清單:產品由零件組成,零件本身也可有子零件
CREATE TABLE bom (
    parent_id  INT,
    child_id   INT,
    quantity   INT
);

WITH RECURSIVE bom_expand AS (
    -- 基底:頂層產品的直接零件
    SELECT child_id, quantity, 1 AS level,
           quantity::NUMERIC AS total_qty
    FROM   bom
    WHERE  parent_id = 100

    UNION ALL

    -- 遞迴:展開子零件,累積數量
    SELECT b.child_id, b.quantity, be.level + 1,
           be.total_qty * b.quantity  -- 累積乘積計算實際用量
    FROM   bom b
    JOIN   bom_expand be ON b.parent_id = be.child_id
)
SELECT child_id AS part_id, level,
       SUM(total_qty) AS total_required
FROM   bom_expand
GROUP  BY child_id, level
ORDER  BY level, child_id;

防止無窮迴圈

方法一:路徑陣列偵測

WITH RECURSIVE safe_tree AS (
    SELECT id, manager_id, ARRAY[id] AS path
    FROM   employees WHERE id = 1
    UNION ALL
    SELECT e.id, e.manager_id, st.path || e.id
    FROM   employees e
    JOIN   safe_tree st ON e.manager_id = st.id
    WHERE  e.id <> ALL(st.path)  -- 已在路徑中則停止
)
SELECT * FROM safe_tree;

方法二:PG14+ CYCLE 子句

WITH RECURSIVE org_tree AS (
    SELECT id, name, manager_id
    FROM   employees WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id
    FROM   employees e
    JOIN   org_tree ot ON e.manager_id = ot.id
)
CYCLE id SET is_cycle USING cycle_path  -- 原生循環偵測
SELECT id, name, cycle_path
FROM   org_tree
WHERE  NOT is_cycle;

方法三:深度限制

WITH RECURSIVE limited_tree AS (
    SELECT id, manager_id, 0 AS depth
    FROM   employees WHERE id = 1
    UNION ALL
    SELECT e.id, e.manager_id, lt.depth + 1
    FROM   employees e
    JOIN   limited_tree lt ON e.manager_id = lt.id
    WHERE  lt.depth < 10  -- 最多 10 層
)
SELECT * FROM limited_tree;

PG14+ SEARCH 子句:控制遍歷順序

WITH RECURSIVE org_tree AS (
    SELECT id, name, manager_id
    FROM   employees WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id
    FROM   employees e
    JOIN   org_tree ot ON e.manager_id = ot.id
)
-- SEARCH DEPTH FIRST:深度優先遍歷
-- SEARCH BREADTH FIRST:廣度優先遍歷
SEARCH DEPTH FIRST BY id SET ordercol
SELECT id, name, ordercol
FROM   org_tree
ORDER  BY ordercol;

EXISTS vs IN vs ANY 選擇指引

EXISTS:半連接語義

-- EXISTS:只關心「是否存在」,找到第一筆匹配即停止
SELECT c.customer_id, c.name
FROM   customers c
WHERE  EXISTS (
    SELECT 1
    FROM   orders o
    WHERE  o.customer_id = c.customer_id
      AND  o.amount > 1000
);
-- EXISTS 不受 NULL 影響,是最安全的存在性檢查

IN:適合小型列表

-- IN 搭配靜態列表
SELECT * FROM employees
WHERE  department_id IN (1, 2, 5, 10);

-- IN 搭配子查詢:規劃器可能轉為 Hash Semi Join
SELECT * FROM employees
WHERE  department_id IN (
    SELECT id FROM departments WHERE location = 'Taipei'
);

NOT IN 的 NULL 陷阱

-- ⚠️ 若子查詢包含 NULL,NOT IN 可能返回空結果集!
-- 原因:x NOT IN (1, 2, NULL) → x<>1 AND x<>2 AND x<>NULL
--       x<>NULL 永遠為 UNKNOWN,整個條件為 UNKNOWN
SELECT * FROM employees
WHERE  department_id NOT IN (
    SELECT id FROM departments  -- 若有 NULL 的 id,結果為空!
);

-- ✅ 安全替代:使用 NOT EXISTS
SELECT * FROM employees e
WHERE  NOT EXISTS (
    SELECT 1 FROM departments d
    WHERE  d.id = e.department_id
);

ANY / ALL:搭配比較運算子

-- = ANY 等同於 IN
SELECT * FROM employees
WHERE  department_id = ANY (ARRAY[1, 2, 5]);

-- > ANY:大於子查詢中的最小值
SELECT name, salary FROM employees
WHERE  salary > ANY (
    SELECT AVG(salary) FROM employees GROUP BY department_id
);

-- > ALL:大於子查詢中的所有值(即大於最大值)
SELECT name, salary FROM employees
WHERE  salary > ALL (
    SELECT AVG(salary) FROM employees GROUP BY department_id
);

選擇指引:

運算子適用場景
EXISTS確認存在性、子查詢大且能早期退出、含 NULL 安全
IN靜態小列表、子查詢不含 NULL
NOT IN危險! 子查詢有 NULL 時返回空集合,改用 NOT EXISTS
= ANY功能同 IN,可搭配陣列語法
> ANY / < ANY大於最小值 / 小於最大值
> ALL / < ALL大於最大值 / 小於最小值

Correlated Subquery 改寫

Correlated Subquery 中引用外層查詢的欄位,導致外層每處理一列就執行一次子查詢:

-- ❌ Correlated Subquery(效能差)
SELECT
    e.employee_id, e.name,
    (SELECT COUNT(*)
     FROM   orders o
     WHERE  o.employee_id = e.employee_id) AS order_count
FROM   employees e;
-- 若 employees 有 10000 列,子查詢執行 10000 次!

改寫方式一:LEFT JOIN + GROUP BY

-- ✅ 效能佳
SELECT e.employee_id, e.name, COUNT(o.order_id) AS order_count
FROM   employees e
LEFT   JOIN orders o USING (employee_id)
GROUP  BY e.employee_id, e.name;

改寫方式二:CTE 預先聚合

-- ✅ 語義更清晰
WITH order_counts AS (
    SELECT employee_id, COUNT(*) AS order_count
    FROM   orders
    GROUP  BY employee_id
)
SELECT e.employee_id, e.name,
       COALESCE(oc.order_count, 0) AS order_count
FROM   employees e
LEFT   JOIN order_counts oc USING (employee_id);

改寫方式三:Window Function 取代排名子查詢

-- ❌ 子查詢計算部門內薪資排名
SELECT e.name, e.salary,
    (SELECT COUNT(*) + 1
     FROM   employees e2
     WHERE  e2.department_id = e.department_id
       AND  e2.salary > e.salary) AS rank_in_dept
FROM   employees e;

-- ✅ Window Function(最佳方案)
SELECT name, salary,
       RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank_in_dept
FROM   employees;

可寫 CTE(WITH … INSERT/UPDATE/DELETE)

可寫 CTE 允許在 WITH 子句中執行 DML,並透過 RETURNING 傳遞結果:

-- INSERT 並立即使用返回的 ID
WITH inserted AS (
    INSERT INTO customers (name, email)
    VALUES ('新客戶', 'new@example.com')
    RETURNING customer_id, name
)
INSERT INTO customer_audit (customer_id, action, action_time)
SELECT customer_id, 'CREATED', NOW()
FROM   inserted;
-- 資料搬移:從一張表刪除,插入另一張表
WITH moved AS (
    DELETE FROM orders
    WHERE  order_date < '2020-01-01'
    RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM moved;
-- UPDATE 並記錄變更
WITH salary_update AS (
    UPDATE employees
    SET    salary = salary * 1.1
    WHERE  performance_rating = 'A'
    RETURNING employee_id, name,
              salary AS new_salary,
              salary / 1.1 AS old_salary
)
INSERT INTO salary_history (employee_id, old_salary, new_salary, changed_at)
SELECT employee_id, old_salary, new_salary, NOW()
FROM   salary_update;

重要特性

  • 同一 WITH 語句中的所有 CTE 以相同快照執行,彼此看不到對方的修改
  • 必須使用 RETURNING 才能將 DML 結果傳遞給後續 CTE

子查詢執行策略

PostgreSQL 規劃器對子查詢有三種執行策略,可透過 EXPLAIN 觀察:

策略行為效能
InitPlan非關聯子查詢,只執行一次,結果快取最佳
Hashed SubPlanIN / = ANY 型,建 Hash Table + O(1) 查找接近 Hash Join
SubPlan關聯子查詢,外層每列執行一次最差(N 次)
-- InitPlan 範例
SELECT * FROM employees
WHERE  salary > (SELECT AVG(salary) FROM employees);
-- EXPLAIN 顯示:InitPlan 1 (returns $0)

-- SubPlan 範例(應改寫為 JOIN)
SELECT e.name,
       (SELECT MAX(o.amount) FROM orders o
        WHERE  o.employee_id = e.employee_id) AS max_order
FROM   employees e;
-- EXPLAIN 顯示:SubPlan 1 → 每列執行一次

版本演進

版本重要特性
PG 8.4引入 CTE(WITH 子句)與遞迴 CTE
PG 9.1可寫 CTE(WITH … INSERT/UPDATE/DELETE)
PG 12CTE 智慧型 Inline/Materialized 決策;MATERIALIZED / NOT MATERIALIZED 關鍵字
PG 14遞迴 CTE 的 CYCLESEARCH 子句(ISO SQL 標準)

常見陷阱

CTE 被多次引用時的重複計算

-- PG12+ 若 CTE 被多次引用但規劃器選擇 Inline,可能重複計算
-- 計算昂貴的 CTE 應強制 MATERIALIZED
WITH expensive_stats AS MATERIALIZED (
    SELECT department_id,
           AVG(salary) AS avg_sal,
           STDDEV(salary) AS std_sal
    FROM   employees
    GROUP  BY department_id
)
SELECT e.name, e.salary, es.avg_sal
FROM   employees e
JOIN   expensive_stats es USING (department_id)  -- 第一次引用
WHERE  e.salary > (
    SELECT avg_sal + std_sal
    FROM   expensive_stats                        -- 第二次引用
    WHERE  department_id = e.department_id
);
-- MATERIALIZED 確保只計算一次

遞迴 CTE 中 UNION vs UNION ALL

-- UNION ALL:允許重複列,效能佳(推薦)
-- UNION:每步去除重複,效能差但可防止某些類型的無窮迴圈
-- 通常搭配路徑陣列偵測 + UNION ALL 是最佳組合

總結

CTE 與子查詢 是 PostgreSQL 查詢組合的核心工具:

  • 非遞迴 CTE 提升查詢可讀性,PG12+ 智慧型決策避免優化障壁
  • 遞迴 CTE 處理樹狀結構、圖遍歷、BOM 展開等遞迴問題
  • CYCLE / SEARCH(PG14+)提供原生循環偵測與遍歷順序控制
  • 可寫 CTE 在單一語句中完成多步驟 DML 操作
  • EXISTS 是最安全的存在性檢查;NOT IN 遇 NULL 會失效,改用 NOT EXISTS
  • Correlated Subquery 效能極差,應改寫為 JOIN 或 Window Function

下一篇,我們將深入探討 JSON 與 JSONB 操作——PostgreSQL 的半結構化資料處理能力。

BenZ Software Developer

熱愛技術的軟體開發者,在這裡分享程式開發經驗與學習筆記。