R:计算数据中 0 的总体百分比
R: Counting Overall Percentage of 0's in Data
我正在使用 R 编程语言。
在下面的link(https://www.geeksforgeeks.org/how-to-find-the-percentage-of-missing-values-in-a-dataframe-in-r/)中,我找到了一种计算数据帧中NAs总百分比的方法:
# declaring a data frame in R
data_frame = data.frame(C1= c(1, 2, NA, 0),
C2= c( NA, NA, 3, 8),
C3= c("A", "V", "j", "y"),
C4=c(NA,NA,NA,NA))
percentage = mean(is.na(data_frame)) * 100
[1] 43.75
我的问题:有没有办法扩展它来计算数据框中“任何元素”的百分比?
例如,这可以用来计算数据集中0的百分比吗?或者“j”在数据中出现的次数百分比?或者“2”在数据集中出现的次数百分比?
我可以手动完成:
# count percentage of "j" in the data
v1 = nrow(subset(data_frame, C1 == "j"))
v2 = nrow(subset(data_frame, C2 == "j"))
v3 = nrow(subset(data_frame, C3== "j"))
v4 = nrow(subset(data_frame, C4 == "j"))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
[1] 6.25
# count percentage of "0" in the data (I don't think this is right, it should be written as "nrow(subset(data_frame, C1 <= 0))"?)
v1 = nrow(subset(data_frame, C1 = 0))
v2 = nrow(subset(data_frame, C2 = 0))
v3 = nrow(subset(data_frame, C3= 0))
v4 = nrow(subset(data_frame, C4 = 0))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
但是有更快的方法吗?
谢谢!
您可以尝试unlist
将您的数据框转换为向量
vec = unlist(data_frame)
mean(vec %in% "j") * 100 # 6.25
mean(vec %in% "0") * 100 # 6.25
mean(vec %in% NA) * 100 # 43.75
这是一个 tidyverse
+ 基础 R 解决方案。
library(tidyverse)
data_frame %>%
mutate(across(everything(), ~ .x %in% "j")) %>%
unlist() %>%
mean() * 100
输出
[1] 6.25
虽然这很容易变成一个函数。
calc <- function(df, val) {
df %>%
mutate(across(everything(), ~ .x %in% val)) %>%
unlist() %>%
mean() * 100
}
输出
calc(data_frame, "j") # 6.25
calc(data_frame, "0") # 6.25
calc(data_frame, NA) # 43.75
假设数据框的单元格中没有嵌入列表,您不必取消列出它:
data_frame = data.frame(C1= c(1, 2, NA, 0),
C2= c( NA, NA, 3, 8),
C3= c("A", "V", "j", "y"),
C4=c(NA,NA,NA,NA))
sum(data_frame == 'j', na.rm = TRUE) / prod(dim(data_frame)) * 100
[1] 6.25
sum(data_frame == 0, na.rm = TRUE) / prod(dim(data_frame)) * 100
[1] 6.25
sum(is.na(data_frame)) / prod(dim(data_frame)) * 100
[1] 43.75
我正在使用 R 编程语言。
在下面的link(https://www.geeksforgeeks.org/how-to-find-the-percentage-of-missing-values-in-a-dataframe-in-r/)中,我找到了一种计算数据帧中NAs总百分比的方法:
# declaring a data frame in R
data_frame = data.frame(C1= c(1, 2, NA, 0),
C2= c( NA, NA, 3, 8),
C3= c("A", "V", "j", "y"),
C4=c(NA,NA,NA,NA))
percentage = mean(is.na(data_frame)) * 100
[1] 43.75
我的问题:有没有办法扩展它来计算数据框中“任何元素”的百分比?
例如,这可以用来计算数据集中0的百分比吗?或者“j”在数据中出现的次数百分比?或者“2”在数据集中出现的次数百分比?
我可以手动完成:
# count percentage of "j" in the data
v1 = nrow(subset(data_frame, C1 == "j"))
v2 = nrow(subset(data_frame, C2 == "j"))
v3 = nrow(subset(data_frame, C3== "j"))
v4 = nrow(subset(data_frame, C4 == "j"))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
[1] 6.25
# count percentage of "0" in the data (I don't think this is right, it should be written as "nrow(subset(data_frame, C1 <= 0))"?)
v1 = nrow(subset(data_frame, C1 = 0))
v2 = nrow(subset(data_frame, C2 = 0))
v3 = nrow(subset(data_frame, C3= 0))
v4 = nrow(subset(data_frame, C4 = 0))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
但是有更快的方法吗?
谢谢!
您可以尝试unlist
将您的数据框转换为向量
vec = unlist(data_frame)
mean(vec %in% "j") * 100 # 6.25
mean(vec %in% "0") * 100 # 6.25
mean(vec %in% NA) * 100 # 43.75
这是一个 tidyverse
+ 基础 R 解决方案。
library(tidyverse)
data_frame %>%
mutate(across(everything(), ~ .x %in% "j")) %>%
unlist() %>%
mean() * 100
输出
[1] 6.25
虽然这很容易变成一个函数。
calc <- function(df, val) {
df %>%
mutate(across(everything(), ~ .x %in% val)) %>%
unlist() %>%
mean() * 100
}
输出
calc(data_frame, "j") # 6.25
calc(data_frame, "0") # 6.25
calc(data_frame, NA) # 43.75
假设数据框的单元格中没有嵌入列表,您不必取消列出它:
data_frame = data.frame(C1= c(1, 2, NA, 0),
C2= c( NA, NA, 3, 8),
C3= c("A", "V", "j", "y"),
C4=c(NA,NA,NA,NA))
sum(data_frame == 'j', na.rm = TRUE) / prod(dim(data_frame)) * 100
[1] 6.25
sum(data_frame == 0, na.rm = TRUE) / prod(dim(data_frame)) * 100
[1] 6.25
sum(is.na(data_frame)) / prod(dim(data_frame)) * 100
[1] 43.75