比较规范化数据的多个记录

Question

我有以下格式的数据，我正在尝试使用 SQL 对其进行比较：

数据是“组”的评分，每组最多评分 4 个值：

tblRatings

RatingID	GroupID	Value1	Value2	Value3	Value4
1	2222	13	19	(null)	(null)
2	2222	13	(null)	(null)	(null)
3	2223	1	(null)	(null)	(null)
4	2223	1	(null)	(null)	(null)
5	2224	5	(null)	(null)	(null)
6	2225	10	12	13	(null)
7	2225	12	13	10	(null)

我的目标是比较记录并确定哪些 GroupID 具有两个匹配的评级，匹配定义为任何顺序的相同值列表，所有空值都将被忽略。因此在示例数据中，GroupIDs 2223 和 2225 匹配，而其他则不匹配。

您将如何进行比较？

作为第一步，我使用联合查询将数据规范化为每行一个值，如下所示：

qryRatingsNormalized

RatingID	GroupID	Value
1	2222	13
1	2222	19
2	2222	13
3	2223	1
4	2223	1
5	2224	5
6	2225	10
6	2225	12
6	2225	13
7	2225	12
7	2225	13
7	2225	10

但是我不确定如何从那里继续。

仅供参考，我在 MS Access 中使用 SQL 服务器中链接的表。

Answer 1

我没有真正跟上 Access/Jet SQL 的状态，但我相当确定这是一个有效的查询。我的建议是，您可以通过匹配一组代表匹配单个值的聚合值来逃脱。

select r1.RatingID, r2.RatingID, r1.GroupID
from
    (
    select
        GroupID, RatingID,
        count(Value) cnt, sum(Value) tot,
        avg(Value) avg, min(Value) as lst, max(Value) as grt,
        /* assumes no zeroes */
        floor(sum(log(Value+10)) * 100000) as lg
    from qryRatingsNormalized
    group by GroupID, RatingID
    ) r1
        inner join
    (
    select
        GroupID, RatingID,
        count(Value) cnt, sum(Value) tot,
        avg(Value) avg, min(Value) as lst, max(Value) as grt,
        floor(sum(log(Value+10)) * 100000) as lg
    from qryRatingsNormalized
    group by GroupID, RatingID
    ) r2 on     r2.GroupID = r1.GroupID and r2.RatingID > r1.RatingID
            and r2.cnt = r1.cnt and r2.lst = r1.lst and r2.grt = r1.grt
            and r2.tot = r1.tot and r2.avg = r1.avg and r2.lg = r1.lg

我重复了两次相同的子查询，所以也许只想将其定义为命名的 Query/View 并让它在没有命名子查询的情况下工作。

最大值为四个值可能比较有利。这个想法是，通过比较总数、乘积（通过对数）、平均值、计数、最小值和最大值，您将很有可能认为这些集合必须是一个且相同。这很像校验和。

其中一些可能取决于您是否需要重复执行此操作或者它只是一个 one-off，实际值是多少，总共有多少 ratings/groups，是否为负数有效...

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=1b7b05c44d6935d43e54b012ebc0b486

Answer 2

您可以尝试使用 tsql 设置 pass-through 查询来绕过访问限制：

with r as (
    select GroupID, RatingID, val,
        row_number() over (partition by RatingID order by val) as rn,
        count(*) over (partition by RatingID) as cnt
    from tblRatings cross apply
        (values (Value1), (Value2), (Value3), (Value4)) as v (val)
    where val is not null
)
select r1.RatingID, r2.RatingID, min(r1.GroupID) as GroupID
from r r1 left outer join r r2 on
    r2.GroupID = r1.GroupID and r2.RatingID > r1.RatingID
    and r2.cnt = r1.cnt
    and r2.rn = r1.rn and r2.val = r1.val
group by r1.RatingID, r2.RatingID
having count(r2.val) = count(*) and count(*) = min(r2.cnt);

在此处查看工作示例：

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=1b7b05c44d6935d43e54b012ebc0b486

比较规范化数据的多个记录

Compare multiple records of normalized data

sql

sql-server