Redshift 创建列表并用它搜索不同的 table
Redshift create list and search different table with it
我认为有几种方法可以解决这个问题,但我不确定如何去做。
我有两个 table,第一个有 ID 和号码。 ID 和编号可能会被多次列出,因此我创建了一个结果 table,其中列出了按 ID 分组的唯一编号。
我的第二个 table 有行(1 亿行)再次带有 ID 和数字。我需要在 table 中搜索任何编号不在结果 table.
的编号列表中的 ID
redshift 能否根据 ID 是否匹配以及号码是否存在于 table 的列表中进行查询?这一切都可以在 memory/one 语句中完成吗?
DROP TABLE IF EXISTS `myTable`;
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1890),
("UHO21QQY3TW",4370),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1892),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
SELECT ID, listagg(distinct Numbers,',') as Number_List, count(Numbers) as Numbers_Count
FROM myTable
GROUP BY ID
AS result
DROP TABLE IF EXISTS `myTable2`;
CREATE TABLE `myTable2` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable2` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1870),
("UHO21QQY3TW",4350),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1882),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
伪代码
Select ID, listagg(distinct Numbers) as Violation
Where Numbers IN NOT IN result.Numbers_List
or possibly: WHERE Numbers NOT LIKE '%' || result.Numbers_List|| '%'
期望的输出
(“CRQ44MPX1SZ”, ”1870,1882”)
(“UHO21QQY3TW”, ”4350”)
编辑
走 JOIN 路线,我没有得到正确的结果......但我很确定我的 WHERE 实现是错误的。
SELECT mytable1.ID, listagg(distinct mytable2.Numbers, ',') as unauth_list, count(mytable2.Numbers) as unauth_count
FROM mytable1
LEFT JOIN mytable2 on mytable1.id = mytable2.id
WHERE (mytable1.id = mytable2.id)
AND (mytable1.Numbers <> mytable2.Numbers)
GROUP BY mytable1.id
预期输出:
(“CRQ44MPX1SZ”, ”1870,1882”, 2)
(“UHO21QQY3TW”, ”4350”, 1)
只需左联接 ID 和数字上的两个表并检查(where 子句)以查看是否未找到匹配项。不应该需要 listagg() 和复杂的比较。还是我漏掉了部分问题?
我认为有几种方法可以解决这个问题,但我不确定如何去做。 我有两个 table,第一个有 ID 和号码。 ID 和编号可能会被多次列出,因此我创建了一个结果 table,其中列出了按 ID 分组的唯一编号。
我的第二个 table 有行(1 亿行)再次带有 ID 和数字。我需要在 table 中搜索任何编号不在结果 table.
的编号列表中的 IDredshift 能否根据 ID 是否匹配以及号码是否存在于 table 的列表中进行查询?这一切都可以在 memory/one 语句中完成吗?
DROP TABLE IF EXISTS `myTable`;
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1890),
("UHO21QQY3TW",4370),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1892),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
SELECT ID, listagg(distinct Numbers,',') as Number_List, count(Numbers) as Numbers_Count
FROM myTable
GROUP BY ID
AS result
DROP TABLE IF EXISTS `myTable2`;
CREATE TABLE `myTable2` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable2` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1870),
("UHO21QQY3TW",4350),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1882),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
伪代码
Select ID, listagg(distinct Numbers) as Violation
Where Numbers IN NOT IN result.Numbers_List
or possibly: WHERE Numbers NOT LIKE '%' || result.Numbers_List|| '%'
期望的输出
(“CRQ44MPX1SZ”, ”1870,1882”)
(“UHO21QQY3TW”, ”4350”)
编辑 走 JOIN 路线,我没有得到正确的结果......但我很确定我的 WHERE 实现是错误的。
SELECT mytable1.ID, listagg(distinct mytable2.Numbers, ',') as unauth_list, count(mytable2.Numbers) as unauth_count
FROM mytable1
LEFT JOIN mytable2 on mytable1.id = mytable2.id
WHERE (mytable1.id = mytable2.id)
AND (mytable1.Numbers <> mytable2.Numbers)
GROUP BY mytable1.id
预期输出:
(“CRQ44MPX1SZ”, ”1870,1882”, 2)
(“UHO21QQY3TW”, ”4350”, 1)
只需左联接 ID 和数字上的两个表并检查(where 子句)以查看是否未找到匹配项。不应该需要 listagg() 和复杂的比较。还是我漏掉了部分问题?