获取每组的第一行 SQL
get first row fo each group SQL
我有如下数据。
我想从以下数据中得到每个性别的首选
subjectID <- c("1", "2", "1", "0", "1", "0", "0", "1", "0", "2",
"0", "0", "2", "2","2","1","2","1","0","2")
gender <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F",
"M", "F", "M", "M", "F","M", "F", "M", "F", "F")
selection <- data.frame(subjectID, gender)
subjectID <- c("1", "2", "0")
subject <- c("Maths", "Music", "English")
subjects <- data.frame(subjectID, subject)
我试过按如下降序排列选项:
favourite <- sqldf("SELECT a.gender, b.subject, COUNT(a.subjectID) as `no of selections`
FROM selection a
JOIN subjects b
ON (a.subjectID = b.subjectID )
GROUP BY a.subjectID, a.gender
ORDER BY a.gender, `no of selections` DESC
")
但是,我想获得以下 table,其中我获得了每个性别的首选:
gender <- c("F", "M")
subjects <- c("Music", "Maths")
mostfav <- data.frame(gender, subjects)
如果我没理解错的话,你可以在SQL中使用window函数:
SELECT gs.*
FROM (SELECT s.gender, su.subject, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY s.gender ORDER BY COUNT(*) DESC) as seqnum
FROM selection s JOIN
subjects su
ON su.subjectID = s.subjectID
GROUP BY s.gender, su.subject
) gs
WHERE seqnum = 1;
如果您运行的是 MySQL 8.0,您可以在子查询中使用 RANK()
按每个性别的主题计数对记录进行排名,并在外部查询中过滤每组的顶部记录(如果有顶级关系,RANK()
保留它们):
SELECT gender, subject, no_of_selections
FROM (
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections,
RANK() OVER(PARTITION BY se.gender ORDER BY COUNT(*) DESC) rn
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
) t
WHERE rn = 1
ORDER BY gender DESC
在早期版本中,window 函数不可用,一种选择是使用 having
子句进行过滤,使 returns 每个性别的计数最高:
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
HAVING COUNT(*) = (
SELECT COUNT(*)
FROM selection se1
WHERE se1.gender = se.gender
GROUP BY se1.subjectID, se1.gender
ORDER BY COUNT(*) DESC
LIMIT 1
)
备注:
我更改了 table 别名,使它们更有意义
您应该将 subject
列添加到 GROUP BY
子句中,以使您的查询可以在 sql 模式 ONLY_FULL_GROUP_MODE
下运行,该模式默认启用开始 MySQL 5.7
我有如下数据。 我想从以下数据中得到每个性别的首选
subjectID <- c("1", "2", "1", "0", "1", "0", "0", "1", "0", "2",
"0", "0", "2", "2","2","1","2","1","0","2")
gender <- c("M", "M", "F", "M", "M", "F", "M", "M", "M", "F",
"M", "F", "M", "M", "F","M", "F", "M", "F", "F")
selection <- data.frame(subjectID, gender)
subjectID <- c("1", "2", "0")
subject <- c("Maths", "Music", "English")
subjects <- data.frame(subjectID, subject)
我试过按如下降序排列选项:
favourite <- sqldf("SELECT a.gender, b.subject, COUNT(a.subjectID) as `no of selections`
FROM selection a
JOIN subjects b
ON (a.subjectID = b.subjectID )
GROUP BY a.subjectID, a.gender
ORDER BY a.gender, `no of selections` DESC
")
但是,我想获得以下 table,其中我获得了每个性别的首选:
gender <- c("F", "M")
subjects <- c("Music", "Maths")
mostfav <- data.frame(gender, subjects)
如果我没理解错的话,你可以在SQL中使用window函数:
SELECT gs.*
FROM (SELECT s.gender, su.subject, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY s.gender ORDER BY COUNT(*) DESC) as seqnum
FROM selection s JOIN
subjects su
ON su.subjectID = s.subjectID
GROUP BY s.gender, su.subject
) gs
WHERE seqnum = 1;
如果您运行的是 MySQL 8.0,您可以在子查询中使用 RANK()
按每个性别的主题计数对记录进行排名,并在外部查询中过滤每组的顶部记录(如果有顶级关系,RANK()
保留它们):
SELECT gender, subject, no_of_selections
FROM (
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections,
RANK() OVER(PARTITION BY se.gender ORDER BY COUNT(*) DESC) rn
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
) t
WHERE rn = 1
ORDER BY gender DESC
在早期版本中,window 函数不可用,一种选择是使用 having
子句进行过滤,使 returns 每个性别的计数最高:
SELECT
se.gender,
su.subject,
COUNT(*) as no_of_selections
FROM selection se
JOIN subjects su ON se.subjectID = su.subjectID
GROUP BY se.subjectID, se.gender, su.subject
HAVING COUNT(*) = (
SELECT COUNT(*)
FROM selection se1
WHERE se1.gender = se.gender
GROUP BY se1.subjectID, se1.gender
ORDER BY COUNT(*) DESC
LIMIT 1
)
备注:
我更改了 table 别名,使它们更有意义
您应该将
subject
列添加到GROUP BY
子句中,以使您的查询可以在 sql 模式ONLY_FULL_GROUP_MODE
下运行,该模式默认启用开始 MySQL 5.7