使用 group by、2 having 和 where 子句连接 4 个表
Join 4 tables with group by, 2 having and where clause
我的数据库包含 4 个表:
- users(id, "name", 姓氏, 生日)
- 友谊(userid1,userid2,"timestamp")
- posts(id, userid, "text", "timestamp")
- 喜欢(postid,userid,"timestamp")
我需要获得一个结果集,其中包含在 2018 年 1 月内拥有超过 3 个友谊的唯一用户名,并且他们的 "likes" 平均每个 "post" 在 [10; 35).
我为第一步写了这个语句:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3;
工作正常,returns 3 行。但是当我以这种方式添加第二部分时:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
join likes l on p.id = l.postid
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
我快疯了 94 行。我不知道为什么。
将感谢可能的帮助。
试试下面的方法!使用 "count(f.userid1) > 3" 的问题是,如果用户有,例如2 个朋友和 6 个帖子和 3 个赞他们将得到 2 x 6 = 12 行,因此 12 条记录具有非空 f.userid1。通过计算不同的 f.userid2,您可以计算不同的朋友。用于过滤的其他计数也会出现类似问题。
select u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
left join likes l on p.id = l.postid
where f."timestamp" > '2018-01-01'::timestamp and f."timestamp" < '2018-02-01'::timestamp
group by u.id, u."name"
having
--at least three distinct friends
count( distinct f.userid2) > 3
--distinct likes / distinct posts
--we use l.* to count distinct likes since there's no primary key
and ((count(distinct l.*) / count(distinct p.id)) >= 10
and ((count(distinct l.*) / count(distinct p.id)) < 35);
您不需要 u.name
中的 distinct
,因为聚合会删除重复项。
select
u."name"
from
users u
inner join friendships f on u.id = f.userid1
inner join posts p on u.id = p.userid
inner join likes l on p.id = l.postid
where
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
group by
u."name"
having
count(distinct f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
如评论所述。当你使用 between
代替 date
做范围时不是个好主意。
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
会给你一个完整的一月。
我的数据库包含 4 个表:
- users(id, "name", 姓氏, 生日)
- 友谊(userid1,userid2,"timestamp")
- posts(id, userid, "text", "timestamp")
- 喜欢(postid,userid,"timestamp")
我需要获得一个结果集,其中包含在 2018 年 1 月内拥有超过 3 个友谊的唯一用户名,并且他们的 "likes" 平均每个 "post" 在 [10; 35).
我为第一步写了这个语句:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3;
工作正常,returns 3 行。但是当我以这种方式添加第二部分时:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
join likes l on p.id = l.postid
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
我快疯了 94 行。我不知道为什么。 将感谢可能的帮助。
试试下面的方法!使用 "count(f.userid1) > 3" 的问题是,如果用户有,例如2 个朋友和 6 个帖子和 3 个赞他们将得到 2 x 6 = 12 行,因此 12 条记录具有非空 f.userid1。通过计算不同的 f.userid2,您可以计算不同的朋友。用于过滤的其他计数也会出现类似问题。
select u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
left join likes l on p.id = l.postid
where f."timestamp" > '2018-01-01'::timestamp and f."timestamp" < '2018-02-01'::timestamp
group by u.id, u."name"
having
--at least three distinct friends
count( distinct f.userid2) > 3
--distinct likes / distinct posts
--we use l.* to count distinct likes since there's no primary key
and ((count(distinct l.*) / count(distinct p.id)) >= 10
and ((count(distinct l.*) / count(distinct p.id)) < 35);
您不需要 u.name
中的 distinct
,因为聚合会删除重复项。
select
u."name"
from
users u
inner join friendships f on u.id = f.userid1
inner join posts p on u.id = p.userid
inner join likes l on p.id = l.postid
where
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
group by
u."name"
having
count(distinct f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
如评论所述。当你使用 between
代替 date
做范围时不是个好主意。
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
会给你一个完整的一月。