计算 R 中数据帧每一列中的非空值

Count non-null values in each column of a dataframe in R

我正在使用以下数据框:

cell_a  cell_b  cell_c  group
N/A     2.5     5       A 
1.2     3.6     N/A     A
3       2.1     3.2     A
N/A     N/A     1       B
1.2     N/A     N/A     B
2       N/A     N/A     B  

我想统计每一列中非空值的行数,并按组汇总。

结果应存储在新的数据框中,例如:

cell_a  cell_b  cell_c  group
2       3       2       A 
2       0       1       B 

我试过:

df_2 <- aggregate(df[1:3], list(df$group), length)

但它确实为我提供了每组每行的总长度。我也试过在 length 之后添加 na.action = na.omitna.rm=TRUE 但不起作用。

我还可以在此代码中使用什么来忽略 N/A 值?

非常感谢您的帮助!

aggregate,使用non-NA个元素中的sum个(假设缺失值为NA)作为lengthreturns的总数元素数量(每组,因为我们按组分组)

aggregate(. ~ group, df, FUN = function(x) sum(!is.na(x)), na.action = NULL)

如果NA值是字符串元素"N/A"

aggregate(. ~ group, df, FUN = function(x) sum(x != "N/A"), na.action = NULL)
   group cell_a cell_b cell_c
1     A      2      3      2
2     B      2      0      1

数据

df <- structure(list(cell_a = c("N/A", "1.2", "3", "N/A", "1.2", "2"
), cell_b = c("2.5", "3.6", "2.1", "N/A", "N/A", "N/A"), cell_c = c("5", 
"N/A", "3.2", "1", "N/A", "N/A"), group = c("A", "A", "A", "B", 
"B", "B")), class = "data.frame", row.names = c(NA, -6L))

下面是我们如何使用 dplyr 来做到这一点:

  1. N\A 而不是 NA 更改为 na_ifacross
  2. 总结 across
library(dplyr)

df %>% 
  mutate(across(starts_with("cell"), ~na_if(., "N/A"))) %>% 
  group_by(group) %>% 
  summarise(across(starts_with("cell"), ~sum(!is.na(.))))

 group cell_a cell_b cell_c
  <chr>  <int>  <int>  <int>
1 A          2      3      2
2 B          2      0      1