完全外部连接没有给出我需要的答案
Full outer join not giving the answer I need
我正在使用 PostgreSQL,但很难获得一系列结合来自两个 tables (t1, t2)
的数据的查询
t1 是
studyida
gender
age
a
M
1
a
M
2
a
M
3
b
F
4
b
F
5
b
F
6
c
M
13
c
M
14
c
M
15
t2 是
studyida
studyidb
gender
age
a
z
M
3
a
z
M
4
a
z
M
5
NULL
y
F
7
NULL
y
F
8
NULL
y
F
9
c
x
M
10
c
x
M
11
c
x
M
12
NULL
w
F
7
NULL
w
F
8
NULL
w
F
9
NULL
u
M
7
NULL
u
M
8
NULL
u
M
9
t1 和 t2 通过 StudyIDA 和性别相关。我需要的是 table 的综合列表,包括年龄。有时 t1 中的年龄等于 t2 中的年龄(例如,对于 StudyIDA=a,age=3),但大多数时候不是。
我想创建一个像这样的 table
StudyIDA
StudyIDB
gender
ageA
ageB
a
z
M
1
a
z
M
2
a
z
M
3
3
a
z
M
4
a
z
M
5
b
NULL
F
4
b
NULL
F
5
b
NULL
F
6
NULL
y
F
7
NULL
y
F
8
NULL
y
F
9
c
x
F
13
c
x
F
14
c
x
F
15
c
x
F
10
c
x
F
11
c
x
F
12
NULL
w
F
7
NULL
w
F
8
NULL
w
F
9
NULL
u
M
7
NULL
u
M
8
NULL
u
M
9
我在想,首先 t1 和 t2 的完全外部连接会给我我想要的东西,但它没有。
然后我想我需要一个所有个体的列表(我们称之为 t3),然后做一系列插入(例如 t1+t3 和 t1+t3)到一个新的 table 'construct' 我需要的。当 t1 中的年龄等于 t2 中的年龄时,我真的陷入了奇怪的时代(例如,对于 StudyIDA=a,age=3)。
我仍然没有得到我需要的东西。到目前为止,这是我的代码
DROP TABLE IF EXISTS t1, t2, t3;
CREATE TEMPORARY TABLE t1 (StudyIDA VARCHAR, gender VARCHAR, age int);
INSERT INTO t1 VALUES
('a','M', 1),('a','M', 2),('a','M', 3),
('b','F', 4),('b','F', 5),('b','F', 6),
('c','M', 13),('c','M', 14),('c','M', 15);
SELECT * FROM t1;
CREATE TEMPORARY TABLE t2 (StudyIDA VARCHAR, StudyIDB varchar, gender VARCHAR, age int);
INSERT INTO t2 VALUES
('a','z','M', 3), ('a','z','M', 4), ('a','z','M', 5),
(NULL,'y','F', 7),(NULL,'y','F', 8),(NULL,'y','F', 9),
('c','x','M', 10),('c','x','M', 11),('c','x','M', 12),
(NULL,'w','F', 7),(NULL,'w','F', 8),(NULL,'w','F', 9),
(NULL,'u','M', 7),(NULL,'u','M', 8),(NULL,'u','M', 9);
SELECT * FROM t2;
CREATE TEMPORARY TABLE t3 (StudyIDA_t1 VARCHAR, gender_t1 VARCHAR, StudyIDA_t2 VARCHAR,StudyIDB varchar,
gender_t2 VARCHAR);
INSERT INTO t3
SELECT * FROM (SELECT DISTINCT StudyIDA, gender FROM t1) a FULL OUTER JOIN
(SELECT DISTINCT StudyIDA, StudyIDB, gender FROM t2) b
ON a.StudyIDA=b.StudyIDA AND a.gender=b.gender
ORDER BY a.StudyIDA;
SELECT * FROM t3 ORDER BY StudyIDA_t1;
SELECT 'IN t1', *
FROM t3 JOIN t1 on t1.StudyIDA=t3.StudyIDA_t1 AND t1.gender=t3.gender_t1
ORDER BY StudyIDA_t1, StudyIDB;
SELECT 'In t2',*
FROM t3 JOIN t2 on t3.StudyIDA_t1=t2.StudyIDA AND t3.gender_t1=t2.gender
ORDER BY StudyIDA_t1, t3.StudyIDB;
DROP TABLE IF EXISTS t1, t2, t3;
可能包括年龄的完整连接?
还有一些用于公共字段的合并。
SELECT DISTINCT
COALESCE(t1.StudyIDA, t2.StudyIDA) AS StudyIDA
, t2.StudyIDB
, COALESCE(t1.gender, t2.gender) AS gender
, t1.age as ageA
, t2.age as ageB
FROM t1
FULL JOIN t2
ON t2.StudyIDA is not distinct from t1.StudyIDA
AND t2.gender = t1.gender
AND t2.age = t1.age
ORDER BY StudyIDA, gender, ageA, ageB;
studyida | studyidb | gender | agea | ageb
:------- | :------- | :----- | ---: | ---:
a | null | M | 1 | null
a | null | M | 2 | null
a | z | M | 3 | 3
a | z | M | null | 4
a | z | M | null | 5
b | null | F | 4 | null
b | null | F | 5 | null
b | null | F | 6 | null
c | null | M | 13 | null
c | null | M | 14 | null
c | null | M | 15 | null
c | x | M | null | 10
c | x | M | null | 11
c | x | M | null | 12
null | w | F | null | 7
null | y | F | null | 7
null | w | F | null | 8
null | y | F | null | 8
null | w | F | null | 9
null | y | F | null | 9
null | u | M | null | 7
null | u | M | null | 8
null | u | M | null | 9
db<>fiddle here
您的示例数据表明只有 t2.studyida
可以是 NULL
,所有其他列实际上应该声明为 NOT NULL
.
如果是这样,我建议这个更简单的查询:
SELECT studyida, b.studyidb, gender, age
, CASE WHEN a.age IS NULL THEN 'b'
WHEN b.age IS NULL THEN 'a'
ELSE 'a and b' END as source
FROM t1 a
FULL JOIN t2 b USING (studyida, gender, age)
ORDER BY studyida, gender, age;
db<>fiddle here
USING
子句对于同名连接列很方便。结果集中只有一个连接列的实例,实际上 COALESCE(a.col, b.col)
给你的是什么。 (您可能只使用 SELECT *
。)
您仍然可以使用 table-qualification 引用源列,例如 a.age
.
我减少到单个 age
列并添加了 source
。你可能想要也可能不想要那个。
无论哪种方式,“年龄”都会受到比特腐烂,几乎总是 table 列的错误选择,通常应替换为“生日”或类似内容。
我正在使用 PostgreSQL,但很难获得一系列结合来自两个 tables (t1, t2)
的数据的查询t1 是
studyida | gender | age |
---|---|---|
a | M | 1 |
a | M | 2 |
a | M | 3 |
b | F | 4 |
b | F | 5 |
b | F | 6 |
c | M | 13 |
c | M | 14 |
c | M | 15 |
t2 是
studyida | studyidb | gender | age |
---|---|---|---|
a | z | M | 3 |
a | z | M | 4 |
a | z | M | 5 |
NULL | y | F | 7 |
NULL | y | F | 8 |
NULL | y | F | 9 |
c | x | M | 10 |
c | x | M | 11 |
c | x | M | 12 |
NULL | w | F | 7 |
NULL | w | F | 8 |
NULL | w | F | 9 |
NULL | u | M | 7 |
NULL | u | M | 8 |
NULL | u | M | 9 |
t1 和 t2 通过 StudyIDA 和性别相关。我需要的是 table 的综合列表,包括年龄。有时 t1 中的年龄等于 t2 中的年龄(例如,对于 StudyIDA=a,age=3),但大多数时候不是。
我想创建一个像这样的 table
StudyIDA | StudyIDB | gender | ageA | ageB |
---|---|---|---|---|
a | z | M | 1 | |
a | z | M | 2 | |
a | z | M | 3 | 3 |
a | z | M | 4 | |
a | z | M | 5 | |
b | NULL | F | 4 | |
b | NULL | F | 5 | |
b | NULL | F | 6 | |
NULL | y | F | 7 | |
NULL | y | F | 8 | |
NULL | y | F | 9 | |
c | x | F | 13 | |
c | x | F | 14 | |
c | x | F | 15 | |
c | x | F | 10 | |
c | x | F | 11 | |
c | x | F | 12 | |
NULL | w | F | 7 | |
NULL | w | F | 8 | |
NULL | w | F | 9 | |
NULL | u | M | 7 | |
NULL | u | M | 8 | |
NULL | u | M | 9 |
我在想,首先 t1 和 t2 的完全外部连接会给我我想要的东西,但它没有。
然后我想我需要一个所有个体的列表(我们称之为 t3),然后做一系列插入(例如 t1+t3 和 t1+t3)到一个新的 table 'construct' 我需要的。当 t1 中的年龄等于 t2 中的年龄时,我真的陷入了奇怪的时代(例如,对于 StudyIDA=a,age=3)。
我仍然没有得到我需要的东西。到目前为止,这是我的代码
DROP TABLE IF EXISTS t1, t2, t3;
CREATE TEMPORARY TABLE t1 (StudyIDA VARCHAR, gender VARCHAR, age int);
INSERT INTO t1 VALUES
('a','M', 1),('a','M', 2),('a','M', 3),
('b','F', 4),('b','F', 5),('b','F', 6),
('c','M', 13),('c','M', 14),('c','M', 15);
SELECT * FROM t1;
CREATE TEMPORARY TABLE t2 (StudyIDA VARCHAR, StudyIDB varchar, gender VARCHAR, age int);
INSERT INTO t2 VALUES
('a','z','M', 3), ('a','z','M', 4), ('a','z','M', 5),
(NULL,'y','F', 7),(NULL,'y','F', 8),(NULL,'y','F', 9),
('c','x','M', 10),('c','x','M', 11),('c','x','M', 12),
(NULL,'w','F', 7),(NULL,'w','F', 8),(NULL,'w','F', 9),
(NULL,'u','M', 7),(NULL,'u','M', 8),(NULL,'u','M', 9);
SELECT * FROM t2;
CREATE TEMPORARY TABLE t3 (StudyIDA_t1 VARCHAR, gender_t1 VARCHAR, StudyIDA_t2 VARCHAR,StudyIDB varchar,
gender_t2 VARCHAR);
INSERT INTO t3
SELECT * FROM (SELECT DISTINCT StudyIDA, gender FROM t1) a FULL OUTER JOIN
(SELECT DISTINCT StudyIDA, StudyIDB, gender FROM t2) b
ON a.StudyIDA=b.StudyIDA AND a.gender=b.gender
ORDER BY a.StudyIDA;
SELECT * FROM t3 ORDER BY StudyIDA_t1;
SELECT 'IN t1', *
FROM t3 JOIN t1 on t1.StudyIDA=t3.StudyIDA_t1 AND t1.gender=t3.gender_t1
ORDER BY StudyIDA_t1, StudyIDB;
SELECT 'In t2',*
FROM t3 JOIN t2 on t3.StudyIDA_t1=t2.StudyIDA AND t3.gender_t1=t2.gender
ORDER BY StudyIDA_t1, t3.StudyIDB;
DROP TABLE IF EXISTS t1, t2, t3;
可能包括年龄的完整连接?
还有一些用于公共字段的合并。
SELECT DISTINCT COALESCE(t1.StudyIDA, t2.StudyIDA) AS StudyIDA , t2.StudyIDB , COALESCE(t1.gender, t2.gender) AS gender , t1.age as ageA , t2.age as ageB FROM t1 FULL JOIN t2 ON t2.StudyIDA is not distinct from t1.StudyIDA AND t2.gender = t1.gender AND t2.age = t1.age ORDER BY StudyIDA, gender, ageA, ageB;
studyida | studyidb | gender | agea | ageb :------- | :------- | :----- | ---: | ---: a | null | M | 1 | null a | null | M | 2 | null a | z | M | 3 | 3 a | z | M | null | 4 a | z | M | null | 5 b | null | F | 4 | null b | null | F | 5 | null b | null | F | 6 | null c | null | M | 13 | null c | null | M | 14 | null c | null | M | 15 | null c | x | M | null | 10 c | x | M | null | 11 c | x | M | null | 12 null | w | F | null | 7 null | y | F | null | 7 null | w | F | null | 8 null | y | F | null | 8 null | w | F | null | 9 null | y | F | null | 9 null | u | M | null | 7 null | u | M | null | 8 null | u | M | null | 9
db<>fiddle here
您的示例数据表明只有 t2.studyida
可以是 NULL
,所有其他列实际上应该声明为 NOT NULL
.
如果是这样,我建议这个更简单的查询:
SELECT studyida, b.studyidb, gender, age
, CASE WHEN a.age IS NULL THEN 'b'
WHEN b.age IS NULL THEN 'a'
ELSE 'a and b' END as source
FROM t1 a
FULL JOIN t2 b USING (studyida, gender, age)
ORDER BY studyida, gender, age;
db<>fiddle here
USING
子句对于同名连接列很方便。结果集中只有一个连接列的实例,实际上 COALESCE(a.col, b.col)
给你的是什么。 (您可能只使用 SELECT *
。)
您仍然可以使用 table-qualification 引用源列,例如 a.age
.
我减少到单个 age
列并添加了 source
。你可能想要也可能不想要那个。
无论哪种方式,“年龄”都会受到比特腐烂,几乎总是 table 列的错误选择,通常应替换为“生日”或类似内容。