交叉表转换为横表
目录
做BI或数据分析时,经常会遇到各类格式转换,这里将最近遇到的一个例子记录一下,分享几种SQL写法。
需求
千言万语抵不上一张图:

数据准备
CREATE TABLE dbo.CategoryConfig(Category VARCHAR(20),[Level] INT,A INT,B INT,C INT,D INT)
GO
INSERT INTO dbo.CategoryConfig
VALUES ( 'Online', 1, 234, 345, 644, 987 ),
( 'Online', 2, 213, 456, 543, 657 ),
( 'Online', 3, 321, 323, 456, 657 ),
( 'Offline', 1, 455, 246, 756, 432 ),
( 'Offline', 2, 435, 678, 984, 345 ),
( 'Offline', 3, 567, 325, 343, 456 )
GO
思路与写法
自连接(多次Join)
由于结果集中每个category只有一行,所以可以将Level为1的数据作为第一层数据,分别与Level为2和3的数据进行Join后得到结果:
SELECT a.Category
,a.A AS [A-1],b.A AS [A-2],c.A AS [A-3]
,a.B AS [B-1],b.B AS [B-2],c.B AS [B-3]
,a.c AS [C-1],b.c AS [C-2],c.c AS [C-3]
,a.d AS [D-1],b.d AS [D-2],c.d AS [D-3]
FROM dbo.CategoryConfig AS a
LEFT JOIN dbo.CategoryConfig AS b ON a.Category = b.Category AND b.[Level] = 2
LEFT JOIN dbo.CategoryConfig AS c ON a.Category = c.Category AND c.[Level] = 3
WHERE a.[Level] = 1;
MAX与CASE组合
由于以上需求转换类似于行转列(又称长表转宽表、、纵转横、rows to colums等),于是想到可以用类似写法实现,经典的Case When和聚合函数组合写法非常灵活,可以一试:
SELECT
Category
,MAX(CASE WHEN [Level] = 1 THEN A ELSE 0 END) AS [A-1]
,MAX(CASE WHEN [Level] = 2 THEN A ELSE 0 END) AS [A-2]
,MAX(CASE WHEN [Level] = 3 THEN A ELSE 0 END) AS [A-3]
,MAX(CASE WHEN [Level] = 1 THEN B ELSE 0 END) AS [B-1]
,MAX(CASE WHEN [Level] = 2 THEN B ELSE 0 END) AS [B-2]
,MAX(CASE WHEN [Level] = 3 THEN B ELSE 0 END) AS [B-3]
,MAX(CASE WHEN [Level] = 1 THEN C ELSE 0 END) AS [C-1]
,MAX(CASE WHEN [Level] = 2 THEN C ELSE 0 END) AS [C-2]
,MAX(CASE WHEN [Level] = 3 THEN C ELSE 0 END) AS [C-3]
,MAX(CASE WHEN [Level] = 1 THEN D ELSE 0 END) AS [D-1]
,MAX(CASE WHEN [Level] = 2 THEN D ELSE 0 END) AS [D-2]
,MAX(CASE WHEN [Level] = 3 THEN D ELSE 0 END) AS [D-3]
FROM dbo.CategoryConfig
GROUP BY Category;
先UNPIVOT再PIVOT
这个写法就比较Tricky,思路是先将A、B、C、D列转行,与Level进行字符串拼接后再行转列:
;WITH cte_UnPivot AS
(
SELECT Category,
[Type] + '-' + LTRIM([Level]) AS xType,
xValue
FROM dbo.CategoryConfig AS a
UNPIVOT(xValue FOR [Type] IN (A,B,C,D)) AS b
)
SELECT Category,[A-1],[A-2],[A-3],[B-1],[B-2],[B-3],[C-1],[C-2],[C-3],[D-1],[D-2],[D-3]
FROM cte_UnPivot AS a
PIVOT(MAX(xValue) FOR xType IN ([A-1],[A-2],[A-3],[B-1],[B-2],[B-3],[C-1],[C-2],[C-3],[D-1],[D-2],[D-3])) AS b;
效率与总结
- 若结果集不要求有序,在目前数据量的情况下,自连接Join效率高一些,因为其它写法含聚合操作,需要进行排序,而排序的Cost是很高的。但写法一需要多次扫描数据,而且是基于每个category每个Level的数据都有的情况。
- 如果结果集要按Category排序,则后面两种的写法更高效
- 数据量更大的情况哪个更高效,读者有兴趣可自行尝试
- 关于单纯的表格横纵转换,可参考我的历史文章