dplyr:与 NA 的相关性
dplyr: correlations with NA
xx <- data.frame(group = rep(1:4, each=100), a = rnorm(100) , b = rnorm(100))
xx[c(1,14,33), 'b'] = NA
我正在尝试按组计算相关性,但当存在 NA 时出现错误。
library(dplyr)
xx %>% group_by(group) %>% summarize(COR=cor(a,b,na.rm=TRUE))
Error: Problem with `summarise()` column `COR`.
i `COR = cor(a, b, na.rm = TRUE)`.
x unused argument (na.rm = TRUE)
i The error occurred in group 1: group = 1.
Run `rlang::last_error()` to see where the error occurred.
cor
中没有na.rm
参数,是use
。根据?cor
,用法是
cor(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))
use - an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
library(dplyr)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "complete.obs"))
-输出
# A tibble: 4 × 2
group COR
<int> <dbl>
1 1 0.166
2 2 0.190
3 3 0.190
4 4 0.190
如果有所有 NA 的组,则使用 "na.or.complete"
(更新评论中的数据,组只有 NA)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "na.or.complete"))
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA
其中 returns 与 if/else
条件相同并使用 "complete.obs"
xx %>%
group_by(group) %>%
summarize(COR= if(any(complete.cases(a, b)))
cor(a,b, use = "complete.obs") else NA_real_)
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA
xx <- data.frame(group = rep(1:4, each=100), a = rnorm(100) , b = rnorm(100))
xx[c(1,14,33), 'b'] = NA
我正在尝试按组计算相关性,但当存在 NA 时出现错误。
library(dplyr)
xx %>% group_by(group) %>% summarize(COR=cor(a,b,na.rm=TRUE))
Error: Problem with `summarise()` column `COR`.
i `COR = cor(a, b, na.rm = TRUE)`.
x unused argument (na.rm = TRUE)
i The error occurred in group 1: group = 1.
Run `rlang::last_error()` to see where the error occurred.
cor
中没有na.rm
参数,是use
。根据?cor
,用法是
cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
use - an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
library(dplyr)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "complete.obs"))
-输出
# A tibble: 4 × 2
group COR
<int> <dbl>
1 1 0.166
2 2 0.190
3 3 0.190
4 4 0.190
如果有所有 NA 的组,则使用 "na.or.complete"
(更新评论中的数据,组只有 NA)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "na.or.complete"))
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA
其中 returns 与 if/else
条件相同并使用 "complete.obs"
xx %>%
group_by(group) %>%
summarize(COR= if(any(complete.cases(a, b)))
cor(a,b, use = "complete.obs") else NA_real_)
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA