如何总结缺失数据的分类变量?
How to summarise a categorical variable with missing data?
我正在尝试对分类变量 frailty 分数执行 group_by 总结。数据的结构使得每个主题都有多个观察结果,其中一些包含缺失数据,例如
Subject Frailty
1 Managing well
1 NA
1 NA
2 NA
2 NA
2 Vulnerable
3 NA
3 NA
3 NA
我希望对数据进行汇总,以便在有可用的情况下显示脆弱性描述,如果没有则显示 NA,例如
Subject Frailty
1 Managing well
2 Vulnerable
3 NA
我尝试了以下两种方法都返回错误:
Mode <- function(x) {
ux <- na.omit(unique(x[!is.na(x)]))
tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}
data %>%
group_by(Subject) %>%
summarise(frailty = Mode(frailty)) %>%
Error: Expecting a single value: [extent=2].
condense <- function(x){unique(x[!is.na(x)])}
data %>%
group_by(subject) %>%
summarise(frailty = condense(frailty))
Error: Column frailty must be length 1 (a summary value), not 0
如果只有一个非NA元素,则按'Subject'分组后,得到第一个非NA元素
library(dplyr)
data %>%
group_by(Subject) %>%
summarise(Frailty = Frailty[which(!is.na(Frailty))[1]])
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
如果有多个非 NA 独特元素,我们要么 paste
将它们放在一起,要么 return 作为 list
data %>%
group_by(Subject) %>%
summarise(Frailty = na_if(toString(unique(na.omit(Frailty))), ""))
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
数据
data <- structure(list(Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Frailty = c("Managing well", NA, NA, NA, NA, "Vulnerable",
NA, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
涉及 dplyr
的一个解决方案可能是:
df %>%
group_by(Subject) %>%
slice(which.min(is.na(Frailty)))
Subject Frailty
<int> <chr>
1 1 Managing_well
2 2 Vulnerable
3 3 <NA>
我正在尝试对分类变量 frailty 分数执行 group_by 总结。数据的结构使得每个主题都有多个观察结果,其中一些包含缺失数据,例如
Subject Frailty
1 Managing well
1 NA
1 NA
2 NA
2 NA
2 Vulnerable
3 NA
3 NA
3 NA
我希望对数据进行汇总,以便在有可用的情况下显示脆弱性描述,如果没有则显示 NA,例如
Subject Frailty
1 Managing well
2 Vulnerable
3 NA
我尝试了以下两种方法都返回错误:
Mode <- function(x) {
ux <- na.omit(unique(x[!is.na(x)]))
tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}
data %>%
group_by(Subject) %>%
summarise(frailty = Mode(frailty)) %>%
Error: Expecting a single value: [extent=2].
condense <- function(x){unique(x[!is.na(x)])}
data %>%
group_by(subject) %>%
summarise(frailty = condense(frailty))
Error: Column frailty must be length 1 (a summary value), not 0
如果只有一个非NA元素,则按'Subject'分组后,得到第一个非NA元素
library(dplyr)
data %>%
group_by(Subject) %>%
summarise(Frailty = Frailty[which(!is.na(Frailty))[1]])
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
如果有多个非 NA 独特元素,我们要么 paste
将它们放在一起,要么 return 作为 list
data %>%
group_by(Subject) %>%
summarise(Frailty = na_if(toString(unique(na.omit(Frailty))), ""))
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
数据
data <- structure(list(Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Frailty = c("Managing well", NA, NA, NA, NA, "Vulnerable",
NA, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
涉及 dplyr
的一个解决方案可能是:
df %>%
group_by(Subject) %>%
slice(which.min(is.na(Frailty)))
Subject Frailty
<int> <chr>
1 1 Managing_well
2 2 Vulnerable
3 3 <NA>