为什么使用 "extra" 列进行透视不合并结果

Question

我知道你们中的许多人都观察到这种行为，但我想知道是否有人可以解释原因。当我创建一个小 table 来创建一个使用 pivot 函数的示例时，我得到了我期望的结果：

CREATE TABLE dbo.AverageFishLength
    (
      Fishtype VARCHAR(50) ,
      AvgLength DECIMAL(8, 2) ,
      FishAge_Years INT
    )
INSERT  INTO dbo.AverageFishLength
        ( Fishtype, AvgLength, FishAge_Years )
VALUES  ( 'Muskie', 32.75, 3 ),
        ( 'Muskie', 37.5, 4 ),
        ( 'Muskie', 39.75, 5 ),
        ( 'Walleye', 16.5, 3 ),
        ( 'Walleye', 18.25, 4 ),
        ( 'Walleye', 20.0, 5 ),
        ( 'Northern Pike', 20.75, 3 ),
        ( 'Northern Pike', 23.25, 4 ),
        ( 'Northern Pike', 26.0, 5 );

这里是数据透视查询：

SELECT  Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM    dbo.AverageFishLength   PIVOT( SUM(AvgLength) 
                                FOR FishAge_Years IN ( [3], [4], [5] ) ) AS PivotTbl

结果如下：

但是，如果我创建带有标识列的 table，结果将分成不同的行：

DROP TABLE dbo.AverageFishLength
CREATE TABLE dbo.AverageFishLength
    (
      ID INT IDENTITY(1,1) ,
      Fishtype VARCHAR(50) ,
      AvgLength DECIMAL(8, 2) ,
      FishAge_Years INT
    )
INSERT  INTO dbo.AverageFishLength
        ( Fishtype, AvgLength, FishAge_Years )
VALUES  ( 'Muskie', 32.75, 3 ),
        ( 'Muskie', 37.5, 4 ),
        ( 'Muskie', 39.75, 5 ),
        ( 'Walleye', 16.5, 3 ),
        ( 'Walleye', 18.25, 4 ),
        ( 'Walleye', 20.0, 5 ),
        ( 'Northern Pike', 20.75, 3 ),
        ( 'Northern Pike', 23.25, 4 ),
        ( 'Northern Pike', 26.0, 5 );

相同的查询：

SELECT  Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM    dbo.AverageFishLength   PIVOT( SUM(AvgLength) 
                                FOR FishAge_Years IN ( [3], [4], [5] ) ) AS PivotTbl

不同的结果：

在我看来，ID 列正在查询中使用，即使它根本没有出现在查询中。这几乎就像它隐式包含在查询中，但没有显示在结果集中。

谁能解释为什么会这样？

Answer 1

发生这种情况是因为 ID 列对于每一行都是唯一的，并且由于您直接查询 table（没有子查询），该列作为 GROUP BY 的一部分包含在内聚合函数需要。

文档 MSDN docs about FROM 声明如下：

table_source PIVOT <pivot_clause>

Specifies that the table_source is pivoted based on the pivot_column. table_source is a table or table expression. The output is a table that contains all columns of the table_source except the pivot_column and value_column. The columns of the table_source, except the pivot_column and value_column, are called the grouping columns of the pivot operator.

PIVOT performs a grouping operation on the input table with regard to the grouping columns and returns one row for each group. Additionally, the output contains one column for each value specified in the column_list that appears in the pivot_column of the input_table.

您的版本基本上是说 SELECT * FROM yourtable 和 PIVOT 数据。即使 ID 列不在最终的 SELECT 列表中，它也是查询中的一个分组元素。如果您将 PIVOT 与“pre-PIVOT”示例进行比较以显示您将看到您的版本。此示例使用 CASE 表达式和聚合函数：

SELECT Fishtype,
  sum(case when FishAge_Years = 3 then AvgLength else 0 end) as [3],
  sum(case when FishAge_Years = 4 then AvgLength else 0 end) as [4],
  sum(case when FishAge_Years = 5 then AvgLength else 0 end) as [5]
FROM dbo.AverageFishLength
GROUP BY Fishtype, ID;

结果会有所偏差，因为即使您在最终列表中没有 ID，它仍然被用于分组，并且由于它们是唯一的，您会得到多行。

使用 PIVOT 解决此问题的最简单方法是使用子查询：

SELECT Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM
(
  SELECT Fishtype, 
    AvgLength, 
    FishAge_Years
  FROM    dbo.AverageFishLength
) d
PIVOT
( 
  SUM(AvgLength) 
  FOR FishAge_Years IN ( [3], [4], [5] ) 
) AS PivotTbl;

在此版本中，您只返回您实际需要和想要的列 table - 这不包括 ID，因此它不会用于对您的数据进行分组。

为什么使用 "extra" 列进行透视不合并结果

Why pivot with "extra" columns doesn't combine results

sql

tsql

sql-server

pivot