计算指定列 R 中一行中 NA 的个数

Count number of NA's in a Row in Specified Columns R

我希望能够计算在指定列的一行中出现的 NA 的数量。根据我下面的数据,我希望能够计算出现在第一个、最后一个、地址、phone 和状态列中的 NA 的行数(计数中不包括 m_initial 和客户)。

    first   m_initial     last         address            phone      state  customer 
    Bob         L         Turner       123 Turner Lane    410-3141   Iowa   NA        
    Will        P         Williams     456 Williams Rd    491-2359   NA     Y        
    Amanda      C         Jones        789 Haggerty       NA         NA     Y        
    Lisa        NA        Evans        NA                 NA         NA     N        

期望的输出:

    first   m_initial   last       address            phone      state  customer na_count 
    Bob     L           Turner     123 Turner Lane    410-3141   Iowa   NA       0 
    Will    P           Williams   456 Williams Rd    491-2359   NA     Y        1
    Amanda  C           Jones      789 Haggerty       NA         NA     Y        2
    Lisa    NA          Evans      NA                 NA         NA     N        3  
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')])) 

df
   first m_initial     last         address    phone state customer na_count
1    Bob         L   Turner 123 Turner Lane 410-3141  Iowa     <NA>        0
2   Will         P Williams 456 Williams Rd 491-2359  <NA>        Y        1
3 Amanda         C    Jones    789 Haggerty     <NA>  <NA>        Y        2
4   Lisa      <NA>    Evans            <NA>     <NA>  <NA>        N        3

基数 R:

类似于 Onyambu 解决方案,不使用 rowSums 而是使用 apply 并在使用 df[,c(1,3:6]

子集后应用 sum(is.na(x)
df$na_count <- apply(df[,c(1,3:6)], 1, function(x) sum(is.na(x)))

dplyr

library(dplyr)
df %>%  
  mutate(na_count = rowSums(is.na(select(., -c(m_initial, customer)))))

输出:

   first m_initial     last         address    phone state customer na_count
1    Bob         L   Turner 123 Turner Lane 410-3141  Iowa     <NA>        0
2   Will         P Williams 456 Williams Rd 491-2359  <NA>        Y        1
3 Amanda         C    Jones    789 Haggerty     <NA>  <NA>        Y        2
4   Lisa      <NA>    Evans            <NA>     <NA>  <NA>        N        3
library(tidyverse)

df %>%
  rowwise() %>%
  mutate(na_count = sum(is.na(c_across(all_of(c("first", "last", "address", "phone", "state"))))))
#> # A tibble: 4 × 8
#> # Rowwise: 
#>   first  m_initial last     address         phone    state customer na_count
#>   <chr>  <chr>     <chr>    <chr>           <chr>    <chr> <chr>       <int>
#> 1 Bob    L         Turner   123 Turner Lane 410-3141 Iowa  <NA>            0
#> 2 Will   P         Williams 456 Williams Rd 491-2359 <NA>  Y               1
#> 3 Amanda C         Jones    789 Haggerty    <NA>     <NA>  Y               2
#> 4 Lisa   <NA>      Evans    <NA>            <NA>     <NA>  N               3

reprex package (v2.0.1)

创建于 2022-01-04

数据:

structure(list(first = c("Bob", "Will", "Amanda", "Lisa"), m_initial = c("L", 
"P", "C", NA), last = c("Turner", "Williams", "Jones", "Evans"
), address = c("123 Turner Lane", "456 Williams Rd", "789 Haggerty", 
NA), phone = c("410-3141", "491-2359", NA, NA), state = c("Iowa", 
NA, NA, NA), customer = c(NA, "Y", "Y", "N")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))