基于string和null的计数

Counting based on string and null

这是我的示例数据:

id     FirstName      LastName     HouseNo     MyCount
1      A                  C          1-1         2
2      B                  C          1-1         2
4      D                  A                      3
5      F                  A                      3
6      J                  A                      3
7      Q                  X          1-2         3
8      D                  X          1-2         3
9      D                  X          1-2         3
10     A                  C          1-3         3
11     B                  C          1-3         3
12     C                  C          1-3         3
14     F                  K                      2
15     J                  K                      2
16     Q                  X          1-5         1

根据以上数据,我想计算具有相同 HouseNoLastName 的记录数。

为此我使用

SELECT COUNT(ID) AS _COUNT FROM MYTABLE GROUP BY LASTNAME, HOUSENO

但是上面的说法有一个问题。数据中有些记录没有HouseNo。在上面的例子中,ID 4,5,6 和 14,15 没有 HouseNo。所以,上面的语句是 returning 5 但它应该 return 3 和 2 分开。

主要目标

  1. 根据LastNameHouseNo
  2. 进行计数
  3. 统计那些没有 HouseNo 的记录(它们会连续出现)。
  4. 即将到来的计数应该在 MyCount
  5. 中更新

我如何获得此计数?

赏金编辑:

示例数据

id  FirstName   LastName    HouseNo     MyCount     CountId
1   Imran       Khan        1-1         
2   Waseem      Khan        1-1         
3   Rihan       Khan        1-1         
4   Moiz        Shaikh      1-2         
5   Zbair       Shaikh      1-2         
6   Sultan      Shaikh      1-2         
7   Zaid        Khan                    
10  Parvez      Patel       1-3         
11  Ahmed       Patel       1-3         
12  Rahat       Syed        1-4         
13  Talha       Khan                    
14  Zia         Khan                    
15  Arshad      Patel       1-3         
16  Samad       Patel       1-3         
17  Raees       Syed        1-4         
18  Azmat       Khan                    
19  Imran       Khan                    

预期结果:

id  FirstName   LastName    HouseNo     MyCount     CountId
1   Imran       Khan        1-1         3           1
2   Waseem      Khan        1-1         3           1
3   Rihan       Khan        1-1         3           1
4   Moiz        Shaikh      1-2         3           2
5   Zbair       Shaikh      1-2         3           2
6   Sultan      Shaikh      1-2         3           2
7   Zaid        Khan                    1           3
10  Parvez      Patel       1-3         2           4   
11  Ahmed       Patel       1-3         2           4
12  Rahat       Syed        1-4         1           5   
13  Talha       Khan                    2           6
14  Zia         Khan                    2           6   
15  Arshad      Patel       1-3         2           7   
16  Samad       Patel       1-3         2           7
17  Raees       Syed        1-4         1           8   
18  Azmat       Khan                    2           9
19  Imran       Khan                    2           9   
  1. 示例数据中的MyCountCountId为空白,应填写。
  2. MyCount将基于HouseNoLastName,请看ID 1到3,它的姓是khan,房子没有1-1所以[=21= ID 1到3的]为3,CountId为1。
  3. 示例数据中有很多记录没有 HouseNo,因此在这种情况下,系列中的相同姓氏将被计算在内。请查看 ID 7,其计数将为 1。另请参阅 ID 18 和 19,其计数将为 2。
  4. CountId 是 id 计数的序列号。请看ID 1到3,同房号同姓所以是1
SELECT COUNT(ID) AS _COUNT 
FROM MYTABLE 
GROUP BY ISNULL(LASTNAME, ''), ISNULL(HOUSENO, '');

应该这样做

declare @temp table (id int, firstname varchar(5), lastname varchar(5), houseno varchar(5), mycount int)

insert into @temp values(1,   'A',  'C',  '1-1',  2)
insert into @temp values(2,   'B',  'C',  '1-1',  2)
insert into @temp values(4,   'D',  'A',   null,  3)
insert into @temp values(5,   'F',  'A',   null,  3)
insert into @temp values(6,   'J',  'A',   null,  3)
insert into @temp values(7,   'Q',  'X',  '1-2',  3)
insert into @temp values(8,   'D',  'X',  '1-2',  3)
insert into @temp values(9,   'D',  'X',  '1-2',  3)
insert into @temp values(10,  'A',  'C',  '1-3',  3)
insert into @temp values(11,  'B',  'C',  '1-3',  3)
insert into @temp values(12,  'C',  'C',  '1-3',  3)
insert into @temp values(14,  'F',  'K',   null,  2)
insert into @temp values(15,  'J',  'K',   null,  2)
insert into @temp values(16,  'Q',  'X',  '1-5',  1)  

select count(ID) as _count 
from @temp
group by isnull(lastname, ''), isnull(houseno, '') 

这个return秒

_count
   3    
   2    
   2    
   3    
   3    
   1    

你可以吐出更多细节:

select distinct
       t.lastname, 
       isnull(t.houseno, '') as houseno,
       (select count(ID) from @temp t2 where t2.lastname = t.lastname and t2.houseno = t.houseno) as _count_filled,
       (select count(ID) from @temp t2 where t2.lastname = t.lastname and isnull(t2.houseno, '') = isnull(t.houseno, '') and t2.houseno is null) as _count_empty
from   @temp t

它会 return 这个:

lastname    houseno _count_filled   _count_empty    
A                   0               3   
C           1-1     2               0   
C           1-3     3               0   
K                   0               2   
X           1-2     3               0   
X           1-5     1               0   

我认为您的第三个主要目标是更新 MYCOUNT 列的结果,在各自的行上。一般来说,您正在寻找的是 correlated subquery.

UPDATE MYTABLE T1
   SET T1.MYCOUNT =
    ( SELECT COUNT (*)
        FROM MYTABLE T2
        WHERE T1.LASTNAME = B2.LASTNAME
        AND NVL (T2.HOUSENO, 0) = NVL (T1.HOUSENO, 0)
        GROUP BY T2.LASTNAME, T2.HOUSENO);

*注意:这是为 Oracle SQL

实现的

看起来主要的混淆是由你在问题开头的 SQL 语句引起的,你只是 GROUP BY LASTNAME, HOUSENO.

如果你想要一个简单的分组,你的查询是正确的。但是,然后您向我们展示了具有预期结果的更详细的示例数据,并且很明显您不仅需要分组(不关心数据中行的顺序),还希望根据它们对行进行分组顺序。

是道经典题gaps-and-islands。在 SQL Server 2008 中,只需调用几次 ROW_NUMBER 函数即可完成。

示例数据

DECLARE @T TABLE 
    (id int PRIMARY KEY
    ,FirstName nvarchar(50)
    ,LastName nvarchar(50)
    ,HouseNo nvarchar(50)
    ,MyCount int
    ,CountId int);

INSERT INTO @T (id, FirstName, LastName, HouseNo) VALUES
(1 , 'Imran ', 'Khan  ', '1-1'),
(2 , 'Waseem', 'Khan  ', '1-1'),
(3 , 'Rihan ', 'Khan  ', '1-1'),
(4 , 'Moiz  ', 'Shaikh', '1-2'),
(5 , 'Zbair ', 'Shaikh', '1-2'),
(6 , 'Sultan', 'Shaikh', '1-2'),
(7 , 'Zaid  ', 'Khan  ',  NULL),
(10, 'Parvez', 'Patel ', '1-3'),
(11, 'Ahmed ', 'Patel ', '1-3'),
(12, 'Rahat ', 'Syed  ', '1-4'),
(13, 'Talha ', 'Khan  ',  NULL),
(14, 'Zia   ', 'Khan  ',  NULL),
(15, 'Arshad', 'Patel ', '1-3'),
(16, 'Samad ', 'Patel ', '1-3'),
(17, 'Raees ', 'Syed  ', '1-4'),
(18, 'Azmat ', 'Khan  ',  NULL),
(19, 'Imran ', 'Khan  ',  NULL);

SELECT查询

WITH
CTE_RN
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,ROW_NUMBER() OVER (PARTITION BY LastName, HouseNo ORDER BY ID) AS rn1
        ,ROW_NUMBER() OVER (ORDER BY ID) AS rn2
    FROM @T AS T
)
,CTE_GRoups
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,rn2-rn1 AS GroupNumber
        ,COUNT(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS NewMyCount
        ,MIN(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS GroupMinID
    FROM CTE_RN
)
SELECT
    id
    ,FirstName
    ,LastName
    ,HouseNo
    ,rn1
    ,rn2
    ,GroupNumber
    ,NewMyCount
    ,GroupMinID
    ,DENSE_RANK() OVER (ORDER BY GroupMinID) AS NewCountId
FROM CTE_GRoups
ORDER BY ID;

结果

+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+
| id | FirstName | LastName | HouseNo | rn1 | rn2 | GroupNumber | NewMyCount | GroupMinID | NewCountId |
+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+
|  1 | Imran     | Khan     | 1-1     |   1 |   1 |           0 |          3 |          1 |          1 |
|  2 | Waseem    | Khan     | 1-1     |   2 |   2 |           0 |          3 |          1 |          1 |
|  3 | Rihan     | Khan     | 1-1     |   3 |   3 |           0 |          3 |          1 |          1 |
|  4 | Moiz      | Shaikh   | 1-2     |   1 |   4 |           3 |          3 |          4 |          2 |
|  5 | Zbair     | Shaikh   | 1-2     |   2 |   5 |           3 |          3 |          4 |          2 |
|  6 | Sultan    | Shaikh   | 1-2     |   3 |   6 |           3 |          3 |          4 |          2 |
|  7 | Zaid      | Khan     | NULL    |   1 |   7 |           6 |          1 |          7 |          3 |
| 10 | Parvez    | Patel    | 1-3     |   1 |   8 |           7 |          2 |         10 |          4 |
| 11 | Ahmed     | Patel    | 1-3     |   2 |   9 |           7 |          2 |         10 |          4 |
| 12 | Rahat     | Syed     | 1-4     |   1 |  10 |           9 |          1 |         12 |          5 |
| 13 | Talha     | Khan     | NULL    |   2 |  11 |           9 |          2 |         13 |          6 |
| 14 | Zia       | Khan     | NULL    |   3 |  12 |           9 |          2 |         13 |          6 |
| 15 | Arshad    | Patel    | 1-3     |   3 |  13 |          10 |          2 |         15 |          7 |
| 16 | Samad     | Patel    | 1-3     |   4 |  14 |          10 |          2 |         15 |          7 |
| 17 | Raees     | Syed     | 1-4     |   2 |  15 |          13 |          1 |         17 |          8 |
| 18 | Azmat     | Khan     | NULL    |   4 |  16 |          12 |          2 |         18 |          9 |
| 19 | Imran     | Khan     | NULL    |   5 |  17 |          12 |          2 |         18 |          9 |
+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+

这里我在结果中包含了所有中间步骤,所以你可以看到它是如何工作的。主要部分是两组ROW_NUMBERrn1 序列为每个 LastName, HouseNo 重新启动。它被 LastName, HouseNo 分割。 rn2 是一个没有间隙的简单递增序列。我们需要它,因为原始 ID 定义了顺序,但可以有间隙。

然后我们将这两个序列相减,差值得到 GroupNumber.

计算一个组中元素的数量很简单 COUNT,我们可以得到 NewMyCount.

分两步完成无间隙枚举序列号组。首先 MIN 给出了一个组的标识符,然后 DENSE_RANK 生成了一个 NewCountId 没有间隙的序列。


如果你想用计算的 NewMyCountNewCountId 实际更新原始 table,很容易将上面的 SELECT 查询转换为 UPDATE 查询:

更新查询

WITH
CTE_RN
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,ROW_NUMBER() OVER (PARTITION BY LastName, HouseNo ORDER BY ID) AS rn1
        ,ROW_NUMBER() OVER (ORDER BY ID) AS rn2
    FROM @T AS T
)
,CTE_GRoups
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,rn2-rn1 AS GroupNumber
        ,COUNT(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS NewMyCount
        ,MIN(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS GroupMinID
    FROM CTE_RN
)
,CTE_Update
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,GroupNumber
        ,NewMyCount
        ,GroupMinID
        ,DENSE_RANK() OVER (ORDER BY GroupMinID) AS NewCountId
    FROM CTE_GRoups
)
UPDATE CTE_Update
SET
    MyCount = NewMyCount
    ,CountId = NewCountId
;

结果

SELECT *
FROM @T
ORDER BY ID;

+----+-----------+----------+---------+---------+---------+
| id | FirstName | LastName | HouseNo | MyCount | CountId |
+----+-----------+----------+---------+---------+---------+
|  1 | Imran     | Khan     | 1-1     |       3 |       1 |
|  2 | Waseem    | Khan     | 1-1     |       3 |       1 |
|  3 | Rihan     | Khan     | 1-1     |       3 |       1 |
|  4 | Moiz      | Shaikh   | 1-2     |       3 |       2 |
|  5 | Zbair     | Shaikh   | 1-2     |       3 |       2 |
|  6 | Sultan    | Shaikh   | 1-2     |       3 |       2 |
|  7 | Zaid      | Khan     | NULL    |       1 |       3 |
| 10 | Parvez    | Patel    | 1-3     |       2 |       4 |
| 11 | Ahmed     | Patel    | 1-3     |       2 |       4 |
| 12 | Rahat     | Syed     | 1-4     |       1 |       5 |
| 13 | Talha     | Khan     | NULL    |       2 |       6 |
| 14 | Zia       | Khan     | NULL    |       2 |       6 |
| 15 | Arshad    | Patel    | 1-3     |       2 |       7 |
| 16 | Samad     | Patel    | 1-3     |       2 |       7 |
| 17 | Raees     | Syed     | 1-4     |       1 |       8 |
| 18 | Azmat     | Khan     | NULL    |       2 |       9 |
| 19 | Imran     | Khan     | NULL    |       2 |       9 |
+----+-----------+----------+---------+---------+---------+

同意@Vladimir Baranov的分析,这里不再赘述。 我只是想让查询更简单一些,如下所示(在 SQL Server 2012 中测试)

--drop table #temp
create table  #temp  (id int, firstname varchar(15), lastname varchar(15), houseno varchar(5));
go
insert into #temp (id, firstname, lastname, houseno)
values
(1   , 'Imran'       ,'Khan'        ,'1-1')         
,(2   , 'Waseem'      ,'Khan'        ,'1-1')         
,(3   , 'Rihan'       ,'Khan'        ,'1-1')         
,(4   , 'Moiz'        ,'Shaikh'      ,'1-2')         
,(5   , 'Zbair'       ,'Shaikh'      ,'1-2')         
,(6   , 'Sultan'      ,'Shaikh'      ,'1-2')         
,(7   , 'Zaid'        ,'Khan'        , null)         
,(10  , 'Parvez'      ,'Patel'       ,'1-3')         
,(11  , 'Ahmed'       ,'Patel'       ,'1-3')         
,(12  , 'Rahat'       ,'Syed'        ,'1-4')         
,(13  , 'Talha'       ,'Khan'        ,null )         
,(14  , 'Zia'         ,'Khan'        ,null )         
,(15  , 'Arshad'      ,'Patel'       ,'1-3')         
,(16  , 'Samad'       ,'Patel'       ,'1-3')         
,(17  , 'Raees'       ,'Syed'        ,'1-4')         
,(18  , 'Azmat'       ,'Khan'        , null)      
,(19  , 'Imran'       ,'Khan'        , null)
 
-- query
; with c as (
select id, firstname, lastname, houseno=isnull(houseno, '')
, new_id=row_number() over (partition by lastname, isnull(houseno, '') order by id)
, grp = id -row_number() over (partition by lastname, isnull(houseno, '') order by id)
FROM #temp 
)
, d as (
select id, firstname, lastname, houseno, T.cnt, c.grp
, row_id=id-row_number() over ( partition by grp, houseno order by c.grp)
from c
cross apply (select cnt=count(*) from c as c2 where c.grp = c2.grp and c.lastname=c2.lastname and c.houseno=c2.houseno) T(cnt)
)
select id, FirstName, LastName, Houseno, MyCount=cnt,  CountId= DENSE_RANK() over (order by row_id)
from d

结果如下:

使用 CTE 然后更新您的 table 如下:

;WITH T AS
(
    SELECT
        *,      
        ROW_NUMBER() OVER (ORDER BY ID) AS SrNo,
        ROW_NUMBER() OVER (PARTITION BY LastName,HouseNo ORDER BY HouseNo) AS PartNo        
    FROM MYTABLE
),
X as
 (
    SELECT 
        T.LastName,
        T.HouseNo,  
        (MAX(T.ID)-MIN(T.ID))+1 AS NoOfCount,        
         ROW_NUMBER() OVER(Order BY  MAX(ID)) AS RowNo,
         MAX(ID) AS ID       
    FROM T
    GROUP BY T.LastName,T.HouseNo, (T.SrNo - T.PartNo)      
)

Update MYTABLE
SET 
    MyCount=X.NoOfCount,
    CountId=X.RowNo
FROM X
WHERE MYTABLE.LastName=X.LastName 
AND MYTABLE.HouseNo=X.HouseNo 
AND MYTABLE.ID<=X.ID

SELECT * FROM MYTABLE

输出:

首先创建一个视图来计算每个部分的数量和排名。

CREATE  VIEW cnt  AS 
SELECT
    T.LastName,
    T.HouseNo,  MIN(t.id)  AS START , MAX(T.id) AS finish , 
    (MAX(T.ID)-MIN(T.ID))+1 AS NoOfCount,        
     ROW_NUMBER() OVER(Order BY  MAX(T.ID)) AS RowNo,
     MAX(T.ID) AS ID       
FROM (
SELECT
    *,      
    ROW_NUMBER() OVER (ORDER BY ID) AS SrNo,
    ROW_NUMBER() OVER (PARTITION BY LastName,HouseNo ORDER BY HouseNo) AS PartNo        
FROM myTable
) T 
GROUP BY T.LastName,T.HouseNo, (T.SrNo - T.PartNo) 

然后将其用于您的目的:

SELECT a.*,
       b.NoOfCount,
       b.RowNo
FROM   myTable         AS a
       INNER JOIN cnt  AS b
            ON  a.id BETWEEN b.start AND b.finish

结果如下:

试试这个:

SELECT COUNT(ID) AS _COUNT FROM MYTABLE GROUP BY LASTNAME + ISNULL(HOUSENO,'')