R:根据几个条件求和行
R: sum row based on several conditions
我正在写我的论文,对 r 知之甚少,所以这个问题的答案可能很明显。
我的数据集如下所示:
county<-c('1001','1001','1001','1202','1202','1303','1303')
naics<-c('423620','423630','423720','423620','423720','423550','423720')
employment<-c(5,6,5,5,5,6,5)
data<-data.frame(county,naics,employment)
对于每个县,我想对具有 naics“423620”和“423720”的行的就业值求和。 (所以两个条件:1.相同的县代码2.那两个naics代码)添加它们的行应该是第一个('423620'),第二个('423720')应该被删除
最终数据集应如下所示:
county2<-c('1001','1001','1202','1303','1303')
naics2<-c('423620','423630','423620','423550','423720')
employment2<-c(10,6,10,6,5)
data2<-data.frame(county2,naics2,employment2)
我尝试过自己用aggregate和rowSum来做,但是因为这两个条件,到目前为止我都失败了。非常感谢。
我们可以做到
library(dplyr)
data$naics <- as.character(data$naics)
data %>%
filter(naics %in% c(423620, 423720)) %>% group_by(county) %>%
summarise(naics = "423620", employment = sum(employment)) %>%
bind_rows(., filter(data, !naics %in% c(423620, 423720)))
# A tibble: 5 x 3
# county naics employment
# <fctr> <chr> <dbl>
#1 1001 423620 10
#2 1202 423620 10
#3 1303 423620 5
#4 1001 423630 6
#5 1303 423550 6
这样的条件,我先写一个小帮手,然后传给dplyr mutate:
# replace 423720 by 423620 only if both exist
onlyThoseNAICS <- function(v){
if( ("423620" %in% v) & ("423720" %in% v) ) v[v == "423720"] <- "423620"
v
}
data %>%
dplyr::group_by(county) %>%
dplyr::mutate(naics = onlyThoseNAICS(naics)) %>%
dplyr::group_by(county, naics) %>%
dplyr::summarise(employment = sum(employment)) %>%
dplyr::ungroup()
我正在写我的论文,对 r 知之甚少,所以这个问题的答案可能很明显。
我的数据集如下所示:
county<-c('1001','1001','1001','1202','1202','1303','1303')
naics<-c('423620','423630','423720','423620','423720','423550','423720')
employment<-c(5,6,5,5,5,6,5)
data<-data.frame(county,naics,employment)
对于每个县,我想对具有 naics“423620”和“423720”的行的就业值求和。 (所以两个条件:1.相同的县代码2.那两个naics代码)添加它们的行应该是第一个('423620'),第二个('423720')应该被删除
最终数据集应如下所示:
county2<-c('1001','1001','1202','1303','1303')
naics2<-c('423620','423630','423620','423550','423720')
employment2<-c(10,6,10,6,5)
data2<-data.frame(county2,naics2,employment2)
我尝试过自己用aggregate和rowSum来做,但是因为这两个条件,到目前为止我都失败了。非常感谢。
我们可以做到
library(dplyr)
data$naics <- as.character(data$naics)
data %>%
filter(naics %in% c(423620, 423720)) %>% group_by(county) %>%
summarise(naics = "423620", employment = sum(employment)) %>%
bind_rows(., filter(data, !naics %in% c(423620, 423720)))
# A tibble: 5 x 3
# county naics employment
# <fctr> <chr> <dbl>
#1 1001 423620 10
#2 1202 423620 10
#3 1303 423620 5
#4 1001 423630 6
#5 1303 423550 6
这样的条件,我先写一个小帮手,然后传给dplyr mutate:
# replace 423720 by 423620 only if both exist
onlyThoseNAICS <- function(v){
if( ("423620" %in% v) & ("423720" %in% v) ) v[v == "423720"] <- "423620"
v
}
data %>%
dplyr::group_by(county) %>%
dplyr::mutate(naics = onlyThoseNAICS(naics)) %>%
dplyr::group_by(county, naics) %>%
dplyr::summarise(employment = sum(employment)) %>%
dplyr::ungroup()