Postgres GROUP BY 数组列
Postgres GROUP BY an array column
我有一个学生和 parent 的列表,我想使用学生 ID 将他们分组到家庭中。 Parents 共享相同学号的学生可以被视为一个家庭,而共享相同 parent id 的学生也可以被视为一个家庭。这是一个示例 table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
我想以这样的方式结束:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
为此,我目前正在使用:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
但是我得到一个错误:ERROR: cannot accumulate arrays of different dimensionality
。 SQL state: 2202E
我怎样才能达到预期的效果?
上述语句的内部查询 returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
现在将这些学生分组到 parents 是我遇到的问题。
我已经在这里做了类似的事情(但有点复杂):
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
将 p_id
个值聚合到一个数组中:
s_id
parents
108
{5}
105
{3}
100
{1,3}
在这个数组上自加入原来的table:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
105
{3}
3
Jane Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
100
Mike Doe
105
{3}
3
Jane Doe
105
Lisa Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
删除所有重复的学生记录。剩下的应该是p_id
数组最全的记录。这可以通过数组长度的降序使用 DISTINCT ON(s_id)
来完成:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
最后,您可以按 p_id
数组分组并聚合两个 name
列:
parents
students
{"John Doe","Jane Doe"}
{"Mike Doe","Lisa Doe"}
{"Will Willy"}
{"William Son"}
如果你不想得到一个数组,而是一个字符串列表,你可以使用string_agg(name_colum, ',')
代替array_agg(name_column)
我有一个学生和 parent 的列表,我想使用学生 ID 将他们分组到家庭中。 Parents 共享相同学号的学生可以被视为一个家庭,而共享相同 parent id 的学生也可以被视为一个家庭。这是一个示例 table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
我想以这样的方式结束:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
为此,我目前正在使用:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
但是我得到一个错误:ERROR: cannot accumulate arrays of different dimensionality
。 SQL state: 2202E
我怎样才能达到预期的效果? 上述语句的内部查询 returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
现在将这些学生分组到 parents 是我遇到的问题。
我已经在这里做了类似的事情(但有点复杂):
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
将
p_id
个值聚合到一个数组中:s_id parents 108 {5} 105 {3} 100 {1,3} 在这个数组上自加入原来的table:
s_id parents p_id parent_name s_id student_name 100 {1,3} 1 John Doe 100 Mike Doe 105 {3} 3 Jane Doe 100 Mike Doe 100 {1,3} 3 Jane Doe 100 Mike Doe 105 {3} 3 Jane Doe 105 Lisa Doe 100 {1,3} 3 Jane Doe 105 Lisa Doe 108 {5} 5 Will Willy 108 William Son 删除所有重复的学生记录。剩下的应该是
p_id
数组最全的记录。这可以通过数组长度的降序使用DISTINCT ON(s_id)
来完成:s_id parents p_id parent_name s_id student_name 100 {1,3} 1 John Doe 100 Mike Doe 100 {1,3} 3 Jane Doe 105 Lisa Doe 108 {5} 5 Will Willy 108 William Son 最后,您可以按
p_id
数组分组并聚合两个name
列:parents students {"John Doe","Jane Doe"} {"Mike Doe","Lisa Doe"} {"Will Willy"} {"William Son"}
如果你不想得到一个数组,而是一个字符串列表,你可以使用string_agg(name_colum, ',')
代替array_agg(name_column)