计算 MySQL 中的重叠
Calculating overlap in MySQL
我试图找出哪些 classes 之间的重叠最多。数据存储在 MySQL 中,每个学生在数据库中对于 he/she 占用的每个 class 都有一个完全独立的行(我没有配置它,我无法更改它) .我在下面粘贴了 table 的简化版本。实际上大约有 20 种不同的课程。
CREATE TABLE classes
(`student_id` int, `class` varchar(13));
INSERT INTO classes
(`student_id`, `class`)
VALUES
(55421, 'algebra'),
(27494, 'algebra'),
(64934, 'algebra'),
(65364, 'algebra'),
(21102, 'algebra'),
(90734, 'algebra'),
(20103, 'algebra'),
(57450, 'gym'),
(76411, 'gym'),
(24918, 'gym'),
(65364, 'gym'),
(55421, 'gym'),
(89607, 'world_history'),
(54522, 'world_history'),
(49581, 'world_history'),
(84155, 'world_history'),
(55421, 'world_history'),
(57450, 'world_history');
我最终想使用 Circos (background here),但我很乐意使用任何能让我理解并向人们展示重叠最多和最少的方法。这超出了我的脑海,但我在想我可以使用一个输出 table,每个课程有一行和一列,并列出不同 class es 相交的重叠数量。每个课程与其自身相交的地方可以显示与任何其他类别没有重叠的人数。
只需使用自连接和聚合:
select c1.class, c2.class, count(*)
from classes c1 join
classes c2
on c1.student_id = c2.student_id
group by c1.class, c2.class;
这不会以完全相同的格式生成它。
您可以通过生成表示链接的结果来做到这一点:src -> dst = nb
1) 获取矩阵
select c1.class src_class, c2.class dst_class
from (select distinct class from classes) c1
join (select distinct class from classes) c2
order by src_class, dst_class
不需要"select distinct class"生成矩阵,直接select类和GROUP BY即可。但是,在第 2 步,我们需要那个独特的结果。
结果:
src_class dst_class
-----------------------------
algebra algebra
algebra gym
algebra world_history
gym algebra
gym gym
gym world_history
world_history algebra
world_history gym
world_history world_history
2) 加入与来源和目的地匹配的学生列表
select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
v.class = c1.class
and v.student_id in (select student_id from classes
where class = c2.class)
)
group by src_class, dst_class
order by src_class, dst_class
不同的值(第 1 步)允许我们获得所有 类,即使它们不是链接(而是用 0 代替)。
结果:
src_class dst_class overlap
-------------------------------------
algebra algebra 7
algebra gym 2
algebra world_history 1
gym algebra 2
gym gym 5
gym world_history 2
world_history algebra 1
world_history gym 2
world_history world_history 6
3 - 如果 类 等于
,则进行不同的计算
select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
v.class = c1.class and
(
-- When classes are equals
-- Students presents only in that class
(c1.class = c2.class
and 1 = (select count(*) from classes
where student_id = v.student_id))
or
-- When classes are differents
-- Students present in both classes
(c1.class != c2.class
and v.student_id in (select student_id from classes
where class = c2.class))
)
)
group by src_class, dst_class
order by src_class, dst_class
结果:
src_class dst_class overlap
-------------------------------------
algebra algebra 5
algebra gym 2
algebra world_history 1
gym algebra 2
gym gym 2
gym world_history 2
world_history algebra 1
world_history gym 2
world_history world_history 4
我试图找出哪些 classes 之间的重叠最多。数据存储在 MySQL 中,每个学生在数据库中对于 he/she 占用的每个 class 都有一个完全独立的行(我没有配置它,我无法更改它) .我在下面粘贴了 table 的简化版本。实际上大约有 20 种不同的课程。
CREATE TABLE classes
(`student_id` int, `class` varchar(13));
INSERT INTO classes
(`student_id`, `class`)
VALUES
(55421, 'algebra'),
(27494, 'algebra'),
(64934, 'algebra'),
(65364, 'algebra'),
(21102, 'algebra'),
(90734, 'algebra'),
(20103, 'algebra'),
(57450, 'gym'),
(76411, 'gym'),
(24918, 'gym'),
(65364, 'gym'),
(55421, 'gym'),
(89607, 'world_history'),
(54522, 'world_history'),
(49581, 'world_history'),
(84155, 'world_history'),
(55421, 'world_history'),
(57450, 'world_history');
我最终想使用 Circos (background here),但我很乐意使用任何能让我理解并向人们展示重叠最多和最少的方法。这超出了我的脑海,但我在想我可以使用一个输出 table,每个课程有一行和一列,并列出不同 class es 相交的重叠数量。每个课程与其自身相交的地方可以显示与任何其他类别没有重叠的人数。
只需使用自连接和聚合:
select c1.class, c2.class, count(*)
from classes c1 join
classes c2
on c1.student_id = c2.student_id
group by c1.class, c2.class;
这不会以完全相同的格式生成它。
您可以通过生成表示链接的结果来做到这一点:src -> dst = nb
1) 获取矩阵
select c1.class src_class, c2.class dst_class
from (select distinct class from classes) c1
join (select distinct class from classes) c2
order by src_class, dst_class
不需要"select distinct class"生成矩阵,直接select类和GROUP BY即可。但是,在第 2 步,我们需要那个独特的结果。
结果:
src_class dst_class
-----------------------------
algebra algebra
algebra gym
algebra world_history
gym algebra
gym gym
gym world_history
world_history algebra
world_history gym
world_history world_history
2) 加入与来源和目的地匹配的学生列表
select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
v.class = c1.class
and v.student_id in (select student_id from classes
where class = c2.class)
)
group by src_class, dst_class
order by src_class, dst_class
不同的值(第 1 步)允许我们获得所有 类,即使它们不是链接(而是用 0 代替)。
结果:
src_class dst_class overlap
-------------------------------------
algebra algebra 7
algebra gym 2
algebra world_history 1
gym algebra 2
gym gym 5
gym world_history 2
world_history algebra 1
world_history gym 2
world_history world_history 6
3 - 如果 类 等于
,则进行不同的计算select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
v.class = c1.class and
(
-- When classes are equals
-- Students presents only in that class
(c1.class = c2.class
and 1 = (select count(*) from classes
where student_id = v.student_id))
or
-- When classes are differents
-- Students present in both classes
(c1.class != c2.class
and v.student_id in (select student_id from classes
where class = c2.class))
)
)
group by src_class, dst_class
order by src_class, dst_class
结果:
src_class dst_class overlap
-------------------------------------
algebra algebra 5
algebra gym 2
algebra world_history 1
gym algebra 2
gym gym 2
gym world_history 2
world_history algebra 1
world_history gym 2
world_history world_history 4