计算 R 中数据帧每一列中的非空值
Count non-null values in each column of a dataframe in R
我正在使用以下数据框:
cell_a cell_b cell_c group
N/A 2.5 5 A
1.2 3.6 N/A A
3 2.1 3.2 A
N/A N/A 1 B
1.2 N/A N/A B
2 N/A N/A B
我想统计每一列中非空值的行数,并按组汇总。
结果应存储在新的数据框中,例如:
cell_a cell_b cell_c group
2 3 2 A
2 0 1 B
我试过:
df_2 <- aggregate(df[1:3], list(df$group), length)
但它确实为我提供了每组每行的总长度。我也试过在 length
之后添加 na.action = na.omit
或 na.rm=TRUE
但不起作用。
我还可以在此代码中使用什么来忽略 N/A 值?
非常感谢您的帮助!
与aggregate
,使用non-NA个元素中的sum
个(假设缺失值为NA
)作为length
returns的总数元素数量(每组,因为我们按组分组)
aggregate(. ~ group, df, FUN = function(x) sum(!is.na(x)), na.action = NULL)
如果NA
值是字符串元素"N/A"
aggregate(. ~ group, df, FUN = function(x) sum(x != "N/A"), na.action = NULL)
group cell_a cell_b cell_c
1 A 2 3 2
2 B 2 0 1
数据
df <- structure(list(cell_a = c("N/A", "1.2", "3", "N/A", "1.2", "2"
), cell_b = c("2.5", "3.6", "2.1", "N/A", "N/A", "N/A"), cell_c = c("5",
"N/A", "3.2", "1", "N/A", "N/A"), group = c("A", "A", "A", "B",
"B", "B")), class = "data.frame", row.names = c(NA, -6L))
下面是我们如何使用 dplyr
来做到这一点:
- 将
N\A
而不是 NA
更改为 na_if
和 across
- 组
- 总结
across
library(dplyr)
df %>%
mutate(across(starts_with("cell"), ~na_if(., "N/A"))) %>%
group_by(group) %>%
summarise(across(starts_with("cell"), ~sum(!is.na(.))))
group cell_a cell_b cell_c
<chr> <int> <int> <int>
1 A 2 3 2
2 B 2 0 1
我正在使用以下数据框:
cell_a cell_b cell_c group
N/A 2.5 5 A
1.2 3.6 N/A A
3 2.1 3.2 A
N/A N/A 1 B
1.2 N/A N/A B
2 N/A N/A B
我想统计每一列中非空值的行数,并按组汇总。
结果应存储在新的数据框中,例如:
cell_a cell_b cell_c group
2 3 2 A
2 0 1 B
我试过:
df_2 <- aggregate(df[1:3], list(df$group), length)
但它确实为我提供了每组每行的总长度。我也试过在 length
之后添加 na.action = na.omit
或 na.rm=TRUE
但不起作用。
我还可以在此代码中使用什么来忽略 N/A 值?
非常感谢您的帮助!
与aggregate
,使用non-NA个元素中的sum
个(假设缺失值为NA
)作为length
returns的总数元素数量(每组,因为我们按组分组)
aggregate(. ~ group, df, FUN = function(x) sum(!is.na(x)), na.action = NULL)
如果NA
值是字符串元素"N/A"
aggregate(. ~ group, df, FUN = function(x) sum(x != "N/A"), na.action = NULL)
group cell_a cell_b cell_c
1 A 2 3 2
2 B 2 0 1
数据
df <- structure(list(cell_a = c("N/A", "1.2", "3", "N/A", "1.2", "2"
), cell_b = c("2.5", "3.6", "2.1", "N/A", "N/A", "N/A"), cell_c = c("5",
"N/A", "3.2", "1", "N/A", "N/A"), group = c("A", "A", "A", "B",
"B", "B")), class = "data.frame", row.names = c(NA, -6L))
下面是我们如何使用 dplyr
来做到这一点:
- 将
N\A
而不是NA
更改为na_if
和across
- 组
- 总结
across
library(dplyr)
df %>%
mutate(across(starts_with("cell"), ~na_if(., "N/A"))) %>%
group_by(group) %>%
summarise(across(starts_with("cell"), ~sum(!is.na(.))))
group cell_a cell_b cell_c
<chr> <int> <int> <int>
1 A 2 3 2
2 B 2 0 1