多表Full Outer Join的问题

目录

熟悉SQL的开发者都知道Full Join(同Full Outer Join)是左右两个表的记录都取,但若多个表之间Full Join,较容易出现意料之外的结果,数据可能会重复,这里做一下简单介绍。

数据准备

CREATE TABLE TA(ID INT,Attr NVARCHAR(100));
CREATE TABLE TB(ID INT,Attr NVARCHAR(100));
CREATE TABLE TC(ID INT,Attr NVARCHAR(100));

INSERT INTO TA VALUES(1,'a'),(2,'b'),(3,'c'),(4,NULL),(5,'e');
INSERT INTO TB VALUES(2,'aa'),(3,'bb'),(4,'cc'),(8,NULL);
INSERT INTO TC VALUES(4,'aaa'),(5,'bbb'),(6,'ccc'),(8,'ddd');

需求

将TA、TB、TC三个表数据集中到一个结果集,若ID值重复,Attr字段依次从TA、TB、TC中取第一个非空值。

注:以上需求无法简单地用Union实现

SQL写法

错误写法

一般会像用 Inner Join 或 Left Join 一样,直接多次 Full Join

SELECT
    COALESCE(a.ID, b.ID, c.ID) AS ID
   ,COALESCE(a.Attr, b.Attr, c.Attr) AS Attr
FROM TA a
FULL OUTER JOIN TB b ON a.ID = b.ID
FULL OUTER JOIN TC c ON a.ID = c.ID

查询结果如下:

可以发现ID会有重复,所以这样写是不稳妥的,但为什么会错误呢,请看下图:

由于两个Full Join的条件都是与TA进行ID相等性连接,那么对TA中不存在且TB和TC都存在的ID,就会出现两条不同的记录。

正确写法

  • 写法一:改进Join条件
SELECT
    COALESCE(a.ID, b.ID, c.ID) AS ID
   ,COALESCE(a.Attr, b.Attr, c.Attr) AS Attr
FROM TA a
FULL OUTER JOIN TB b ON (a.ID = b.ID)
FULL OUTER JOIN TC c ON (a.ID = c.ID OR b.ID = c.ID)

此方法在表非常多的时候条件会比较复杂,易出错

  • 写法二:每次Join两个表,结果再与后面的表Join
WITH cte_a AS
(
    SELECT
        COALESCE(a.ID, b.ID) AS ID
       ,COALESCE(a.Attr, b.Attr) AS Attr
    FROM TA a
    FULL OUTER JOIN TB b ON a.ID = b.ID
)
SELECT 
     COALESCE(a.ID, b.ID) AS ID
    ,COALESCE(a.Attr,b.Attr) AS Attr
FROM cte_a AS a
FULL OUTER JOIN TC b ON a.ID = b.ID

此方法在表非常多时,代码会比较冗长

  • 写法三:构造全量ID,转换为Left Join
;WITH cte_a AS
(
    SELECT ID FROM TA UNION
    SELECT ID FROM TB UNION
    SELECT ID FROM TC
)
SELECT 
     COALESCE(b.ID, c.ID, d.ID) AS ID
    ,COALESCE(b.Attr, c.Attr, d.Attr) AS Attr
FROM cte_a AS a
LEFT JOIN TA b ON a.ID = b.ID
LEFT JOIN TB c ON a.ID = c.ID
LEFT JOIN TC d ON a.ID = d.ID

此方法使用Union去重,会存在Sort操作,必要时可以加上索引