在 sql 服务器 2012 中查找最频繁的值
Finding the most frequent value in sql server 2012
我想找到每个客户购买频率最高的产品。我的数据集是这样的:
CustomerID ProdID FavouriteProduct
1 A ?
1 A ?
1 A ?
1 B ?
1 A ?
1 A ?
1 A ?
1 B ?
2 A ?
2 AN ?
2 G ?
2 C ?
2 C ?
2 F ?
2 D ?
2 C ?
产品太多,所以我不能把它们放在一个枢轴上table。
答案如下所示:
CustomerID ProdID FavouriteProduct
1 A A
1 A A
1 A A
1 B A
1 A A
1 A A
1 A A
1 B A
2 A C
2 AN C
2 G C
2 C C
2 C C
2 F C
2 D C
2 C C
查询可能如下所示:
Update table
set FavouriteProduct = (Select
CustomerID, Product, Max(Count(Product))
From Table
group by CustomerID, Product) FP
感谢 Nick,我找到了一种查找最频繁值的方法。我与你分享它是如何工作的:
Select CustomerID,ProductID,Count(*) as Number
from table A
group by CustomerID,ProductID
having Count(*)>= (Select Max(Number) from (Select CustomerID,ProductID,Count(*) as Number from table B where B.CustomerID= A.CustomerID group by CustomerID,Product)C)
以防万一您的 SQL 执行速度不够快并且您的客户也在较小的 table 中,这可能会更好::
select C.CustomerId, R.ProductID
from Customer C
outer apply (
Select top 1 ProductID,Count(*) as Number
from table A
where A.CustomerId = C.CustomerId
group by ProductId
order by Number desc
) R
获得最频繁产品的另一种方法是使用 row_number()
:
select customerid, productid,
max(case when seqnum = 1 then productid end) over (partition by customerid) as favoriteproductid
from (select customerid, productid, count(*) as cnt,
row_number() over (partition by customerid order by count(*) desc) as seqnum
from customer c
group by customerid, productid
) cp;
这个基于本页末尾的示例:http://www.sql-server-performance.com/2006/find-frequent-values/ 可能更快:
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (
PARTITION BY CustomerID
ORDER BY COUNT(*) DESC
) AS Rnk
FROM YourTransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
这个使用了 RANK()
函数。在这种情况下,您不必重新加入相同的 table(这意味着需要的工作要少得多)
现在要更新您现有的数据,我喜欢将我的数据集包装在 WITH 中,以使调试更容易一些,最终更新也更简单一些:
;WITH
(
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (PARTITION BY CustomerID
ORDER BY COUNT(*) DESC) AS Rnk
FROM TransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
) As SRC
UPDATE FavouriteTable
SET Favourite = SRC.ProdID
FROM SRC
WHERE SRC.CustomerID = Favourite.CustomerID
为了 return 完全按照您在问题中描述的行,您可以尝试使用 table 表达式(我在示例中使用了 CTE)首先 return a人气排名,数字越高,产品越受每位顾客欢迎。
WITH RankTable AS (
SELECT
CustomerID, ProductID, COUNT(*) AS Popularity
FROM TableA
GROUP BY CustomerID, ProductID
)
然后可以通过首先对原始 table(TableA)和 table 表达式(RankTable)执行内部联接来 return 编辑完整结果 table ,然后使用 window 函数在 FavoriteProduct 列中创建值。
SELECT
P.CustomerID
, P.ProductID
, FIRST_VALUE(P.ProductID) OVER(
PARTITION BY R.CustomerID
ORDER BY R.Popularity DESC, R.ProductID) AS FavoriteProduct
FROM TableA AS P
INNER JOIN RankTable AS R
ON P.CustomerID = R.CustomerID
AND P.ProductID= R.ProductID;
我想找到每个客户购买频率最高的产品。我的数据集是这样的:
CustomerID ProdID FavouriteProduct
1 A ?
1 A ?
1 A ?
1 B ?
1 A ?
1 A ?
1 A ?
1 B ?
2 A ?
2 AN ?
2 G ?
2 C ?
2 C ?
2 F ?
2 D ?
2 C ?
产品太多,所以我不能把它们放在一个枢轴上table。
答案如下所示:
CustomerID ProdID FavouriteProduct
1 A A
1 A A
1 A A
1 B A
1 A A
1 A A
1 A A
1 B A
2 A C
2 AN C
2 G C
2 C C
2 C C
2 F C
2 D C
2 C C
查询可能如下所示:
Update table
set FavouriteProduct = (Select
CustomerID, Product, Max(Count(Product))
From Table
group by CustomerID, Product) FP
感谢 Nick,我找到了一种查找最频繁值的方法。我与你分享它是如何工作的:
Select CustomerID,ProductID,Count(*) as Number
from table A
group by CustomerID,ProductID
having Count(*)>= (Select Max(Number) from (Select CustomerID,ProductID,Count(*) as Number from table B where B.CustomerID= A.CustomerID group by CustomerID,Product)C)
以防万一您的 SQL 执行速度不够快并且您的客户也在较小的 table 中,这可能会更好::
select C.CustomerId, R.ProductID
from Customer C
outer apply (
Select top 1 ProductID,Count(*) as Number
from table A
where A.CustomerId = C.CustomerId
group by ProductId
order by Number desc
) R
获得最频繁产品的另一种方法是使用 row_number()
:
select customerid, productid,
max(case when seqnum = 1 then productid end) over (partition by customerid) as favoriteproductid
from (select customerid, productid, count(*) as cnt,
row_number() over (partition by customerid order by count(*) desc) as seqnum
from customer c
group by customerid, productid
) cp;
这个基于本页末尾的示例:http://www.sql-server-performance.com/2006/find-frequent-values/ 可能更快:
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (
PARTITION BY CustomerID
ORDER BY COUNT(*) DESC
) AS Rnk
FROM YourTransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
这个使用了 RANK()
函数。在这种情况下,您不必重新加入相同的 table(这意味着需要的工作要少得多)
现在要更新您现有的数据,我喜欢将我的数据集包装在 WITH 中,以使调试更容易一些,最终更新也更简单一些:
;WITH
(
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (PARTITION BY CustomerID
ORDER BY COUNT(*) DESC) AS Rnk
FROM TransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
) As SRC
UPDATE FavouriteTable
SET Favourite = SRC.ProdID
FROM SRC
WHERE SRC.CustomerID = Favourite.CustomerID
为了 return 完全按照您在问题中描述的行,您可以尝试使用 table 表达式(我在示例中使用了 CTE)首先 return a人气排名,数字越高,产品越受每位顾客欢迎。
WITH RankTable AS (
SELECT
CustomerID, ProductID, COUNT(*) AS Popularity
FROM TableA
GROUP BY CustomerID, ProductID
)
然后可以通过首先对原始 table(TableA)和 table 表达式(RankTable)执行内部联接来 return 编辑完整结果 table ,然后使用 window 函数在 FavoriteProduct 列中创建值。
SELECT
P.CustomerID
, P.ProductID
, FIRST_VALUE(P.ProductID) OVER(
PARTITION BY R.CustomerID
ORDER BY R.Popularity DESC, R.ProductID) AS FavoriteProduct
FROM TableA AS P
INNER JOIN RankTable AS R
ON P.CustomerID = R.CustomerID
AND P.ProductID= R.ProductID;