如何找到特定列中零的平均值

How to find the average of zero's in a particular column

我有如下记录的情况。

+-----------+---------+---+--------+
|     a     |    b    | c |   d    |
+-----------+---------+---+--------+
| 537196605 | HZA-LOC | 0 | 201701 |
| 537196605 | HZA-LOC | 0 | 201702 |
| 537196605 | HZA-LOC | 0 | 201703 |
| 537196605 | HZA-LOC | 0 | 201704 |
| 537196605 | HZA-LOC | 0 | 201705 |
| 537196605 | HZA-LOC | 2 | 201706 |
| 537196605 | HZA-LOC | 0 | 201707 |
| 537196605 | HZA-LOC | 4 | 201708 |
| 537196605 | HZA-LOC | 0 | 201709 |
| 537196605 | HZA-LOC | 0 | 201710 |
| 537196605 | HZA-LOC | 0 | 201711 |
| 537196605 | HZA-LOC | 0 | 201712 |
+-----------+---------+---+--------+

现在我想在这里计算零的平均值。

我的意思是我需要计算计数(零)/零的总数,即 10/3 = 3.33

所有在一起的零被认为是一个值。前 5 个值将被计为 1,第 7 个值将被计为 1,第 9-12 个值将被计为 1,总计为 3。 请注意,C 列的值是根据 A 和 B 的分组生成的。 期望的结果应该如下所示:-

a b h 537196605 HZA-LOC 3.33

请帮我提供一个通用的解决方案,因为我有数百万条这样的记录。

您可以使用 ROW_NUMBER() 技巧来创建 group_id 值。

确定组后,统计每个组的成员数和组数,就大功告成了。

http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e10b07b6fd9f8bc7549a3b98aaff3af6

WITH
  enumerated_example AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY a, b    ORDER BY d)   AS row_id,
    ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY d)   AS zero_row_id,
    *
  FROM
    example
),
  zero_groups AS
(
  SELECT
    row_id - zero_row_id   AS group_id,
    COUNT(*)               AS group_size
  FROM
    enumerated_example
  WHERE
    c = 0
  GROUP BY
    row_id - zero_row_id
)
SELECT
  CAST(SUM(group_size) AS DECIMAL(13, 8)) / COUNT(*)
FROM
  zero_groups

这是使用 window 函数的一种方法

SELECT [Avg] = (Count(*) * 1.0) / Sum(CASE WHEN prev_c = c THEN 0 ELSE 1 END) 
FROM   (SELECT *,
               prev_c = Lag(c)OVER(ORDER BY d)
        FROM   yourtable) a
WHERE  c = 0 

不同(但相似)的解决方案:

CREATE TABLE #Sample (a int,
                      b varchar(7),
                      c tinyint,
                      d int)

INSERT INTO #Sample
VALUES
    (537196605,'HZA-LOC',0,201701),
    (537196605,'HZA-LOC',0,201702),
    (537196605,'HZA-LOC',0,201703),
    (537196605,'HZA-LOC',0,201704),
    (537196605,'HZA-LOC',0,201705),
    (537196605,'HZA-LOC',2,201706),
    (537196605,'HZA-LOC',0,201707),
    (537196605,'HZA-LOC',4,201708),
    (537196605,'HZA-LOC',0,201709),
    (537196605,'HZA-LOC',0,201710),
    (537196605,'HZA-LOC',0,201711),
    (537196605,'HZA-LOC',0,201712);
GO

WITH Groups AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY d) -
           ROW_NUMBER() OVER (PARTITION BY a, b, CASE c WHEN 0 THEN 0 ELSE 1 END ORDER BY d) AS Grp
    FROM #Sample)
SELECT a, b,
       CONVERT(decimal(6,2),COUNT(*)) / COUNT(DISTINCT Grp) AS h
FROM Groups
WHERE c = 0
GROUP BY a,b;

GO


DROP TABLE #Sample;
GO

一种方法是使用 运行 总和设置组。然后计算每个 a,b 组合的组数。此后,您只需 group by.

select a,b,1.0*grp_count/count(distinct grp) as result
from (select a,b,c,d,
      sum(case when c<>0 then 1 else 0 end) over(partition by a,b order by d) as grp,
      count(case when c=0 then 1 end) over(partition by a,b) as grp_count
      from tbl
     ) t
group by a,b,grp_count