构造一个 table 的二进制值，指示另一个 table 中存在条目

Question

我有几个 table 包含不同项目的订单信息。客户可能会在不同的 table 中出现多次。这些项目对于 table 是独一无二的。我想创建一个新的 table 来显示客户在给定年份购买的所有商品。每个项目都应该有一个列和一个二进制值，指示客户是否在当年购买了该项目。

换句话说，我想将所有 tables 列出的单项订单（例如客户 1 于 2007 年 11 月购买的项目 a 和 2007 年 5 月的项目 c）转换为年度交易（例如客户 1 2007 年的交易为 {a,c} 或 [1,0,1,0]）。我想将单个订单合并到年度交易中，以便我可以挖掘关联规则。

最小工作示例：

表 1 包含项目 a 和 b 的订单。表 2 包含项目 c 和 d 的订单。

CREATE TABLE table1
(
orderId INT,
customerId INT,
orderDate DATE,
item VARCHAR(1)
);

CREATE TABLE table2
(
orderId INT,
customerId INT,
orderDate DATE,
item VARCHAR(1)
);

INSERT INTO table1 (orderId, customerId, orderDate, item)
VALUES 
('1', '1', '2007-11-11', 'a'),
('2', '2', '2008-3-20', 'b'),
('3', '3','2009-7-11', 'a');

INSERT INTO table2 (orderId, customerId, orderDate, item)
VALUES 
('4', '2', '2008-1-1', 'c'), 
('5', '1', '2007-5-15', 'c'), 
('6', '1', '2009-2-2', 'd');

我正在使用联合来组合 table，因为即使订单不同，某些订单 ID 也可能重叠。

SELECT * 
INTO #table3
FROM
(
SELECT *
FROM table1 
UNION ALL 
SELECT * 
FROM table2
) a;

这是解决方案的尝试，但不是很优雅。更重要的是，它没有按需要将案例陈述应用于每一年。

SELECT customerId, 
DATEPART(YEAR, orderDate) as orderYear,
    CASE
        WHEN customerId IN (
            SELECT DISTINCT customerId
            FROM #table3
            WHERE item = 'a')
            THEN 1
        ELSE 0 
    END AS itemA,
    CASE
        WHEN customerId IN (
            SELECT DISTINCT customerId
            FROM #table3
            WHERE item = 'b')
            THEN 1
        ELSE 0 
    END AS itemB,
    CASE
        WHEN customerId IN (
            SELECT DISTINCT customerId
            FROM #table3
            WHERE item = 'c')
            THEN 1
        ELSE 0 
    END AS itemC,
    CASE
        WHEN customerId IN (
            SELECT DISTINCT customerId
            FROM #table3
            WHERE item = 'd')
            THEN 1
        ELSE 0 
    END AS itemD
FROM #table3
ORDER BY customerId, orderDate;

所需的结果如下所示：

CREATE TABLE desiredResult
(
customerId INT,
orderYear INT,
itemA INT,
itemB INT,
itemC INT,
itemD INT
);

INSERT INTO desiredResult (customerId, orderYear, itemA, itemB, itemC, itemD)
VALUES 
('1', '2007', '1', '0', '1', '0'), 
('1', '2009', '0', '0', '0', '1'), 
('2', '2008', '0', '1', '1', '0'),
('3', '2009', '1', '0', '0', '0');

有没有更简单的方法可以得到我想要的结果？这对 PIVOT 可能有用吗？

Answer 1

我会使用条件聚合来做到这一点：

SELECT customerId, OrderYear,
       MAX(CASE WHEN item = 'a' THEN 1 ELSE 0 END) as itemA,
       MAX(CASE WHEN item = 'b' THEN 1 ELSE 0 END) as itemB,
       MAX(CASE WHEN item = 'c' THEN 1 ELSE 0 END) as itemC,
       MAX(CASE WHEN item = 'd' THEN 1 ELSE 0 END) as itemD
FROM ((SELECT customerId, year(OrderDate) as OrderYear, item FROM table1
      ) union all
      (SELECT customerId, year(OrderDate) as OrderYear, item FROM table2
      )
     ) t
GROUP BY customerId, orderYear;

这也消除了对临时表的需要。

构造一个 table 的二进制值，指示另一个 table 中存在条目

Constructing a table of binary values indicating the existence of entries in another table

sql

sql-server

pivot

case

associations