如何将重复的行合并为一行？

Question

我有一个 table，每一行代表一个人。在这个 table 中有很多重复项我想去掉。我只想基于 name 和 age 进行重复数据删除。但是，列中的信息可以分布在同一员工的不同行之间。例如：

name	age	height	eye_color	weight
John	32	null	green	null
John	32	null	null	75
John	32	180	null	null
John	32	null	null	74

在此示例中，预期输出为：

name	age	height	eye_color	weight
John	32	180	green	75

请注意，权重是 75 还是 74 并不重要，顺序对我的用例来说并不重要，我只想尽可能多地填充 null，同时删除重复项。

某些列存在单一性约束，因此不幸的是，简单地用所需值更新所有行然后每组保留一行不是一种选择，即将 table 更新为如下所示：

name	age	height	eye_color	weight
John	32	180	green	75
John	32	180	green	75
John	32	180	green	75
John	32	180	green	75

之前无法进行重复数据删除。

如果 age 或 name 是员工的 null，则根本不应删除重复数据。

Whosebug 上的一个类似问题是，但他们只保留最少 null 的行，所以它并没有真正解决我的问题。

但是，也许与 PostgreSQL: get first non null value per group 中所示的某些聚合有关，但我暂时无法使用任何东西。

有什么想法吗？

谢谢。

Answer 1

简单的 group by 结合 max 作为聚合函数应该可以解决问题

SELECT
 name,
 age,
 max(height),
 max(eye_color),
 max(weight)
FROM
 employees
GROUP BY
 name, age
WHERE
 name is not null and age is not null;

这里我们总是得到最大的值，所以如果可用的话，除了 null 之外的任何值都应该存在。

对于 table 中的重复数据删除，一种方法是将所需结果插入临时 table，从员工 table 中删除旧数据，然后从临时 [=] 中插入数据23=]:

create table temporary_employee (
 name varchar,
 age integer,
 height integer,
 eye_color varchar,
 weight integer
);

insert into temporary_employee (name, age, height, eye_color, weight) 
(
SELECT
 name,
 age,
 max(height),
 max(eye_color),
 max(weight)
FROM
 employees
GROUP BY
 name, age
WHERE
 name is not null and age is not null
);

DELETE FROM employees;

INSERT INTO employees (name, age, height, eye_color, weight) (
SELECT name, age, height, eye_color, weight FROM temporary_employee);

DROP table temporary_employee;

在某些数据库引擎中，插入命令带有覆盖所有数据的参数，但我在 PostgreSQL 中没有找到这样的参数。

另一种选择是在 table 中添加一个临时列，将需要的数据插入 table 并删除旧数据。

ALTER TABLE employee ADD COLUMN newdata bool;

--Insert wanted data to the employee table and mark it as newdata
INSERT INTO employees (name, age, height, eye_color, weight, newdata) 
(
SELECT
 name,
 age,
 max(height),
 max(eye_color),
 max(weight),
 't' as newdata
FROM
 employees
GROUP BY
 name, age, newdata
WHERE
 name is not null and age is not null
);

-- Delete old data from the table
DELETE FROM employees WHERE newdata != 't';

---Remove temporary column
ALTER TABLE employees DROP COLUMN newdata;

如何将重复的行合并为一行？

How to combine duplicated rows into a single row?

postgresql