根据条件合并 R 中的行和值
Merge rows and values in R based on condition
我有以下示例数据:
# data
school = c('ABC University','ABC Uni','DFG University','DFG U')
applicant = c(2000,3100,210,2000)
students = c(100,2000,300,4000)
df = data.frame(school,applicant,students)
我想合并到这个:
|school |appliant| students |
-----------------------------------
|ABC University| 5100 | 2100 |
|DFG University| 2210 | 4300 |
我运行这个代码:
df$school[df$school == 'ABC Uni'] = 'ABC University'
但它给了我两次 ABC 大学而不是将它们合并在一起。
这实际上取决于您的其余字符串,但您可以查看 grep
并使用 ^
作为开头。
df[grep('^ABC U', df$school), 'school'] <- 'ABC University'
df[grep('^DFG U', df$school), 'school'] <- 'DFG University'
和往常一样aggregate
。
aggregate(cbind(applicant, students) ~ school, df, sum)
# school applicant students
# 1 ABC University 5100 2100
# 2 DFG University 2210 4300
这里是dplyr
stringr
解决方案:
library(dplyr)
library(stringr)
df %>%
mutate(school = str_replace_all(school, c(
"^ABC Uni$" = "ABC University",
"^DFG U$" = "DFG University"))) %>%
group_by(school) %>%
summarise(across(c(applicant, students), sum))
输出:
school applicant students
<chr> <dbl> <dbl>
1 ABC University 5100 2100
2 DFG University 2210 4300
我有以下示例数据:
# data
school = c('ABC University','ABC Uni','DFG University','DFG U')
applicant = c(2000,3100,210,2000)
students = c(100,2000,300,4000)
df = data.frame(school,applicant,students)
我想合并到这个:
|school |appliant| students |
-----------------------------------
|ABC University| 5100 | 2100 |
|DFG University| 2210 | 4300 |
我运行这个代码:
df$school[df$school == 'ABC Uni'] = 'ABC University'
但它给了我两次 ABC 大学而不是将它们合并在一起。
这实际上取决于您的其余字符串,但您可以查看 grep
并使用 ^
作为开头。
df[grep('^ABC U', df$school), 'school'] <- 'ABC University'
df[grep('^DFG U', df$school), 'school'] <- 'DFG University'
和往常一样aggregate
。
aggregate(cbind(applicant, students) ~ school, df, sum)
# school applicant students
# 1 ABC University 5100 2100
# 2 DFG University 2210 4300
这里是dplyr
stringr
解决方案:
library(dplyr)
library(stringr)
df %>%
mutate(school = str_replace_all(school, c(
"^ABC Uni$" = "ABC University",
"^DFG U$" = "DFG University"))) %>%
group_by(school) %>%
summarise(across(c(applicant, students), sum))
输出:
school applicant students
<chr> <dbl> <dbl>
1 ABC University 5100 2100
2 DFG University 2210 4300