使用 dplyr::mutate() 在列的子集上逐行 cor()
Row-wise cor() on subset of columns using dplyr::mutate()
set.seed(8)
df <- data.frame(
A=sample(c(1:3), 10, replace=T),
B=sample(c(1:3), 10, replace=T),
C=sample(c(1:3), 10, replace=T),
D=sample(c(1:3), 10, replace=T),
E=sample(c(1:3), 10, replace=T),
F=sample(c(1:3), 10, replace=T))
想将列的子集传递到 dplyr mutate()
并进行逐行计算,例如 cor()
以获得列 A-C 和 D-F 之间的相关性,但无法弄清楚如何.找到了 SO 灵感 , here and ,但仍然未能生成可接受的代码。例如,我试过这个:
require(plyr)
require(dplyr)
df %>%
rowwise() %>%
mutate(c=cor(.[[1:3]],.[[4:6]]))
你可以试试
df %>%
rowwise() %>%
do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))
这是 FAY (2017) 的另一个解决方案。
> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
V1 V2
* <chr> <chr>
1 Ozone Solar.R
2 Ozone Wind
3 Ozone Temp
4 Ozone Month
5 Ozone Day
6 Solar.R Wind
7 Solar.R Temp
8 Solar.R Month
9 Solar.R Day
10 Wind Temp
11 Wind Month
12 Wind Day
13 Temp Month
14 Temp Day
15 Month Day
我们得到了对的组合。
> bulk_cor <-
+ comb %>%
+ pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+ map_df(broom::tidy) %>%
+ bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
V1 V2 estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <fct> <fct>
1 Ozone Solar.R 0.348 3.88 1.79e- 4 109 0.173 0.502 Pearson's p~ two.sided
2 Ozone Wind -0.602 -8.04 9.27e-13 114 -0.706 -0.471 Pearson's p~ two.sided
3 Ozone Temp 0.698 10.4 2.93e-18 114 0.591 0.781 Pearson's p~ two.sided
4 Ozone Month 0.165 1.78 7.76e- 2 114 -0.0183 0.337 Pearson's p~ two.sided
5 Ozone Day -0.0132 -0.141 8.88e- 1 114 -0.195 0.169 Pearson's p~ two.sided
6 Solar.R Wind -0.0568 -0.683 4.96e- 1 144 -0.217 0.107 Pearson's p~ two.sided
7 Solar.R Temp 0.276 3.44 7.52e- 4 144 0.119 0.419 Pearson's p~ two.sided
8 Solar.R Month -0.0753 -0.906 3.66e- 1 144 -0.235 0.0882 Pearson's p~ two.sided
9 Solar.R Day -0.150 -1.82 7.02e- 2 144 -0.305 0.0125 Pearson's p~ two.sided
10 Wind Temp -0.458 -6.33 2.64e- 9 151 -0.575 -0.323 Pearson's p~ two.sided
11 Wind Month -0.178 -2.23 2.75e- 2 151 -0.328 -0.0202 Pearson's p~ two.sided
12 Wind Day 0.0272 0.334 7.39e- 1 151 -0.132 0.185 Pearson's p~ two.sided
13 Temp Month 0.421 5.70 6.03e- 8 151 0.281 0.543 Pearson's p~ two.sided
14 Temp Day -0.131 -1.62 1.08e- 1 151 -0.283 0.0287 Pearson's p~ two.sided
15 Month Day -0.00796 -0.0978 9.22e- 1 151 -0.166 0.151 Pearson's p~ two.sided
现在您可以使用 dplyr::filter
对您想要的结果进行子集化。
参考书目
FAY,科林。 2017. “一件疯狂的小事叫 purrr - 第 6 部分:做统计。” https://colinfay.me/purrr-statistics/.
set.seed(8)
df <- data.frame(
A=sample(c(1:3), 10, replace=T),
B=sample(c(1:3), 10, replace=T),
C=sample(c(1:3), 10, replace=T),
D=sample(c(1:3), 10, replace=T),
E=sample(c(1:3), 10, replace=T),
F=sample(c(1:3), 10, replace=T))
想将列的子集传递到 dplyr mutate()
并进行逐行计算,例如 cor()
以获得列 A-C 和 D-F 之间的相关性,但无法弄清楚如何.找到了 SO 灵感
require(plyr)
require(dplyr)
df %>%
rowwise() %>%
mutate(c=cor(.[[1:3]],.[[4:6]]))
你可以试试
df %>%
rowwise() %>%
do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))
这是 FAY (2017) 的另一个解决方案。
> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
V1 V2
* <chr> <chr>
1 Ozone Solar.R
2 Ozone Wind
3 Ozone Temp
4 Ozone Month
5 Ozone Day
6 Solar.R Wind
7 Solar.R Temp
8 Solar.R Month
9 Solar.R Day
10 Wind Temp
11 Wind Month
12 Wind Day
13 Temp Month
14 Temp Day
15 Month Day
我们得到了对的组合。
> bulk_cor <-
+ comb %>%
+ pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+ map_df(broom::tidy) %>%
+ bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
V1 V2 estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <fct> <fct>
1 Ozone Solar.R 0.348 3.88 1.79e- 4 109 0.173 0.502 Pearson's p~ two.sided
2 Ozone Wind -0.602 -8.04 9.27e-13 114 -0.706 -0.471 Pearson's p~ two.sided
3 Ozone Temp 0.698 10.4 2.93e-18 114 0.591 0.781 Pearson's p~ two.sided
4 Ozone Month 0.165 1.78 7.76e- 2 114 -0.0183 0.337 Pearson's p~ two.sided
5 Ozone Day -0.0132 -0.141 8.88e- 1 114 -0.195 0.169 Pearson's p~ two.sided
6 Solar.R Wind -0.0568 -0.683 4.96e- 1 144 -0.217 0.107 Pearson's p~ two.sided
7 Solar.R Temp 0.276 3.44 7.52e- 4 144 0.119 0.419 Pearson's p~ two.sided
8 Solar.R Month -0.0753 -0.906 3.66e- 1 144 -0.235 0.0882 Pearson's p~ two.sided
9 Solar.R Day -0.150 -1.82 7.02e- 2 144 -0.305 0.0125 Pearson's p~ two.sided
10 Wind Temp -0.458 -6.33 2.64e- 9 151 -0.575 -0.323 Pearson's p~ two.sided
11 Wind Month -0.178 -2.23 2.75e- 2 151 -0.328 -0.0202 Pearson's p~ two.sided
12 Wind Day 0.0272 0.334 7.39e- 1 151 -0.132 0.185 Pearson's p~ two.sided
13 Temp Month 0.421 5.70 6.03e- 8 151 0.281 0.543 Pearson's p~ two.sided
14 Temp Day -0.131 -1.62 1.08e- 1 151 -0.283 0.0287 Pearson's p~ two.sided
15 Month Day -0.00796 -0.0978 9.22e- 1 151 -0.166 0.151 Pearson's p~ two.sided
现在您可以使用 dplyr::filter
对您想要的结果进行子集化。
参考书目
FAY,科林。 2017. “一件疯狂的小事叫 purrr - 第 6 部分:做统计。” https://colinfay.me/purrr-statistics/.