如何使用来自不同行的信息连接表格?

How can I join tables using information from different rows?

我有两个类似的表要加入。请参阅下面的可重现示例。

需要做什么

查看代码中的注释:连接值“2021-01-01”(列:日期)、'hat'(列:内容)、'cat'(列:内容)和 [= first_table 中的 21=](列:Tote)将生成一个唯一键,该键可以与 second_table 中完全相同的数据连接。结果将是 4 个独特事件的第一行(参见 desired_result:'#first tote')。实际上,行数将达到几百万。

可重现的例子:

CREATE OR REPLACE TABLE
`first_table` (
  `Date` string NOT NULL,
  `TotearrivalTimestamp` string  NOT NULL,
  `Tote` string NOT NULL,
  `content` string NOT NULL,
  `location` string NOT NULL,
);
INSERT INTO `first_table` (`Date`, `TotearrivalTimestamp`, `Tote`, `content`, `location`) VALUES
  ('2021-01-01', '13:00','A','hat','1'), #first tote
  ('2021-01-01', '13:00','A','cat','1'), #first tote
  ('2021-01-01', '14:00', 'B', 'toy', '1'),
  ('2021-01-01', '14:00', 'B', 'cat', '1'),
  ('2021-01-01', '15:00', 'A', 'toy', '1'),
  ('2021-01-01', '13:00', 'A', 'toy', '1'),
  ('2021-01-02', '13:00', 'A', 'hat', '1'),
  ('2021-01-02', '13:00', 'A', 'cat', '1');
  
CREATE OR REPLACE TABLE
`second_table` (
  `Date` string NOT NULL,
  `ToteendingTimestamp` string  NOT NULL,
  `Tote` string NOT NULL,
  `content` string NOT NULL,
  `location` string NOT NULL,
);
INSERT INTO `second_table` (`Date`, `ToteendingTimestamp`, `Tote`, `content`, `location`) VALUES
('2021-01-01', '20:00', 'B', 'cat', '2'),
('2021-01-01', '19:00', 'A', 'cat', '1'), #first tote
('2021-01-01', '19:00', 'A', 'hat', '1'), #first tote
('2021-01-01', '20:00', 'B', 'toy', '2'),
('2021-01-01', '14:00', 'A', 'toy', '1'),
('2021-01-02', '14:00', 'A', 'hat', '1'),
('2021-01-02', '14:00', 'A', 'cat', '1'),
('2021-01-01', '16:00', 'A', 'toy', '1');

CREATE OR REPLACE TABLE
`desired_result` (
  `Date` string NOT NULL,
  `Tote` string NOT NULL,
  `TotearrivalTimestamp` string  NOT NULL,
  `ToteendingTimestamp` string  NOT NULL,
  `location_first_table` string NOT NULL,
  `location_second_table` string NOT NULL,
 );
INSERT INTO `desired_result` (`Date`, `Tote`, `TotearrivalTimestamp`, `ToteendingTimestamp`, `location_first_table`, `location_second_table`) VALUES

('2021-01-01', 'A', '13:00', '19:00', '1', '1'), #first tote
('2021-01-01', 'B', '14:00', '20:00', '1', '1'),
('2021-01-01', 'A', '15:00', '16:00', '1', '2'),
('2021-01-02', 'A', '13:00', '14:00', '1', '1');


#### this does not give what I want####
select first.date as Date, first.tote, first.totearrivaltimestamp, second.toteendingtimestamp, first.location as location_first_table, second.location as location_second_table
from `first_table` first 
inner join `second_table` second 
on first.tote = second.tote 
and first.content = second.content;

这个答案应该有效。我认为您的问题可能与您对表格的一些引用有关....

select f.'date'
,f.tote
, f.totearrivaltimestamp
, s.toteendingtimestamp
, f.location as location_first_table
, s.location as location_second_table
from first f
,INNER JOIN "second" s on f.'date' = s.'date'
and f.tote = s.tote 
and f.content = s.content

我能够用下面的 SQL 重现“desired_result”table(大部分)。我相信 'insert into' 语句存在一些拼写错误。但是,我认为这符合意图。

查询:

select  
first_table.date as Date, 
first_table.tote, 
first_table.totearrivaltimestamp, 
second_table.toteendingtimestamp, 
first_table.location as location_first_table, 
second_table.location as location_second_table
from first_table
inner join `second_table` 
on first_table.Date = second_table.Date 
and first_table.tote = second_table.tote
group by first_table.Date, first_table.TotearrivalTimestamp, first_table.tote;

结果:

2021-01-01|A|13:00|19:00|1|1
2021-01-01|B|14:00|20:00|1|2
2021-01-01|A|15:00|19:00|1|1
2021-01-02|A|13:00|14:00|1|1

此结果假定您的第一个 table 日期始终与 totes/timestamps 匹配。然后按功能分组合并重复结果。第二个 table 信息与第一个 table 的日期和手提袋匹配,并附加到行项目。