计算指定列 R 中一行中 NA 的个数
Count number of NA's in a Row in Specified Columns R
我希望能够计算在指定列的一行中出现的 NA 的数量。根据我下面的数据,我希望能够计算出现在第一个、最后一个、地址、phone 和状态列中的 NA 的行数(计数中不包括 m_initial 和客户)。
first m_initial last address phone state customer
Bob L Turner 123 Turner Lane 410-3141 Iowa NA
Will P Williams 456 Williams Rd 491-2359 NA Y
Amanda C Jones 789 Haggerty NA NA Y
Lisa NA Evans NA NA NA N
期望的输出:
first m_initial last address phone state customer na_count
Bob L Turner 123 Turner Lane 410-3141 Iowa NA 0
Will P Williams 456 Williams Rd 491-2359 NA Y 1
Amanda C Jones 789 Haggerty NA NA Y 2
Lisa NA Evans NA NA NA N 3
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')]))
df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
基数 R:
类似于 Onyambu 解决方案,不使用 rowSums
而是使用 apply
并在使用 df[,c(1,3:6]
子集后应用 sum(is.na(x)
df$na_count <- apply(df[,c(1,3:6)], 1, function(x) sum(is.na(x)))
dplyr
library(dplyr)
df %>%
mutate(na_count = rowSums(is.na(select(., -c(m_initial, customer)))))
输出:
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
library(tidyverse)
df %>%
rowwise() %>%
mutate(na_count = sum(is.na(c_across(all_of(c("first", "last", "address", "phone", "state"))))))
#> # A tibble: 4 × 8
#> # Rowwise:
#> first m_initial last address phone state customer na_count
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
#> 2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
#> 3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
#> 4 Lisa <NA> Evans <NA> <NA> <NA> N 3
由 reprex package (v2.0.1)
创建于 2022-01-04
数据:
structure(list(first = c("Bob", "Will", "Amanda", "Lisa"), m_initial = c("L",
"P", "C", NA), last = c("Turner", "Williams", "Jones", "Evans"
), address = c("123 Turner Lane", "456 Williams Rd", "789 Haggerty",
NA), phone = c("410-3141", "491-2359", NA, NA), state = c("Iowa",
NA, NA, NA), customer = c(NA, "Y", "Y", "N")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
我希望能够计算在指定列的一行中出现的 NA 的数量。根据我下面的数据,我希望能够计算出现在第一个、最后一个、地址、phone 和状态列中的 NA 的行数(计数中不包括 m_initial 和客户)。
first m_initial last address phone state customer
Bob L Turner 123 Turner Lane 410-3141 Iowa NA
Will P Williams 456 Williams Rd 491-2359 NA Y
Amanda C Jones 789 Haggerty NA NA Y
Lisa NA Evans NA NA NA N
期望的输出:
first m_initial last address phone state customer na_count
Bob L Turner 123 Turner Lane 410-3141 Iowa NA 0
Will P Williams 456 Williams Rd 491-2359 NA Y 1
Amanda C Jones 789 Haggerty NA NA Y 2
Lisa NA Evans NA NA NA N 3
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')]))
df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
基数 R:
类似于 Onyambu 解决方案,不使用 rowSums
而是使用 apply
并在使用 df[,c(1,3:6]
sum(is.na(x)
df$na_count <- apply(df[,c(1,3:6)], 1, function(x) sum(is.na(x)))
dplyr
library(dplyr)
df %>%
mutate(na_count = rowSums(is.na(select(., -c(m_initial, customer)))))
输出:
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
library(tidyverse)
df %>%
rowwise() %>%
mutate(na_count = sum(is.na(c_across(all_of(c("first", "last", "address", "phone", "state"))))))
#> # A tibble: 4 × 8
#> # Rowwise:
#> first m_initial last address phone state customer na_count
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
#> 2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
#> 3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
#> 4 Lisa <NA> Evans <NA> <NA> <NA> N 3
由 reprex package (v2.0.1)
创建于 2022-01-04数据:
structure(list(first = c("Bob", "Will", "Amanda", "Lisa"), m_initial = c("L",
"P", "C", NA), last = c("Turner", "Williams", "Jones", "Evans"
), address = c("123 Turner Lane", "456 Williams Rd", "789 Haggerty",
NA), phone = c("410-3141", "491-2359", NA, NA), state = c("Iowa",
NA, NA, NA), customer = c(NA, "Y", "Y", "N")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))