如何在 R Dplyr 中解释 NA
How to account for NA's in R Dplyr
下面是软件包列表、示例数据和我正在 运行ning 的脚本。下面是模式。您会注意到其中两个值高于 500,因此不符合架构。期望的结果将只考虑那些符合模式的人(雇用少于 500 人)。当我 运行 在我的较大数据集(不是下面的示例数据集)上执行此操作时,我得到的结果类似于在底部找到的结果。简而言之,我将如何修改脚本以使其忽略大于 500 的条目,因此不会 return NA 的第五行?
library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbsummary2<-smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites))
smb1 >= 0 and <100
smb2 >= 0 and <150
smb3 >= 0 and <250
smb4 >= 0 and <500
smb employment worksites
1 1000 20
2 1500 22
3 2500 25
4 10000 29
5 25000 NA
我相信这会有所帮助
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)
我刚刚添加了两行语法
- "drop_na"
- "过滤器(就业 < 500)
下面是软件包列表、示例数据和我正在 运行ning 的脚本。下面是模式。您会注意到其中两个值高于 500,因此不符合架构。期望的结果将只考虑那些符合模式的人(雇用少于 500 人)。当我 运行 在我的较大数据集(不是下面的示例数据集)上执行此操作时,我得到的结果类似于在底部找到的结果。简而言之,我将如何修改脚本以使其忽略大于 500 的条目,因此不会 return NA 的第五行?
library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbsummary2<-smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites))
smb1 >= 0 and <100
smb2 >= 0 and <150
smb3 >= 0 and <250
smb4 >= 0 and <500
smb employment worksites
1 1000 20
2 1500 22
3 2500 25
4 10000 29
5 25000 NA
我相信这会有所帮助
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)
我刚刚添加了两行语法
- "drop_na"
- "过滤器(就业 < 500)