使用 dplyr::mutate() 在列的子集上逐行 cor()

Row-wise cor() on subset of columns using dplyr::mutate()

set.seed(8)
df <- data.frame(
  A=sample(c(1:3), 10, replace=T), 
  B=sample(c(1:3), 10, replace=T),
  C=sample(c(1:3), 10, replace=T),
  D=sample(c(1:3), 10, replace=T),
  E=sample(c(1:3), 10, replace=T), 
  F=sample(c(1:3), 10, replace=T))

想将列的子集传递到 dplyr mutate() 并进行逐行计算,例如 cor() 以获得列 A-C 和 D-F 之间的相关性,但无法弄清楚如何.找到了 SO 灵感 , here and ,但仍然未能生成可接受的代码。例如,我试过这个:

require(plyr)
require(dplyr)
df %>%
  rowwise() %>%
  mutate(c=cor(.[[1:3]],.[[4:6]]))

你可以试试

df %>% 
   rowwise() %>% 
   do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))

这是 FAY (2017) 的另一个解决方案。

> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
   V1      V2     
 * <chr>   <chr>  
 1 Ozone   Solar.R
 2 Ozone   Wind   
 3 Ozone   Temp   
 4 Ozone   Month  
 5 Ozone   Day    
 6 Solar.R Wind   
 7 Solar.R Temp   
 8 Solar.R Month  
 9 Solar.R Day    
10 Wind    Temp   
11 Wind    Month  
12 Wind    Day    
13 Temp    Month  
14 Temp    Day    
15 Month   Day    

我们得到了对的组合。

> bulk_cor <-
+   comb %>%
+   pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+   map_df(broom::tidy) %>%
+   bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
   V1      V2      estimate statistic  p.value parameter conf.low conf.high method       alternative
   <chr>   <chr>      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <fct>        <fct>      
 1 Ozone   Solar.R  0.348      3.88   1.79e- 4       109   0.173     0.502  Pearson's p~ two.sided  
 2 Ozone   Wind    -0.602     -8.04   9.27e-13       114  -0.706    -0.471  Pearson's p~ two.sided  
 3 Ozone   Temp     0.698     10.4    2.93e-18       114   0.591     0.781  Pearson's p~ two.sided  
 4 Ozone   Month    0.165      1.78   7.76e- 2       114  -0.0183    0.337  Pearson's p~ two.sided  
 5 Ozone   Day     -0.0132    -0.141  8.88e- 1       114  -0.195     0.169  Pearson's p~ two.sided  
 6 Solar.R Wind    -0.0568    -0.683  4.96e- 1       144  -0.217     0.107  Pearson's p~ two.sided  
 7 Solar.R Temp     0.276      3.44   7.52e- 4       144   0.119     0.419  Pearson's p~ two.sided  
 8 Solar.R Month   -0.0753    -0.906  3.66e- 1       144  -0.235     0.0882 Pearson's p~ two.sided  
 9 Solar.R Day     -0.150     -1.82   7.02e- 2       144  -0.305     0.0125 Pearson's p~ two.sided  
10 Wind    Temp    -0.458     -6.33   2.64e- 9       151  -0.575    -0.323  Pearson's p~ two.sided  
11 Wind    Month   -0.178     -2.23   2.75e- 2       151  -0.328    -0.0202 Pearson's p~ two.sided  
12 Wind    Day      0.0272     0.334  7.39e- 1       151  -0.132     0.185  Pearson's p~ two.sided  
13 Temp    Month    0.421      5.70   6.03e- 8       151   0.281     0.543  Pearson's p~ two.sided  
14 Temp    Day     -0.131     -1.62   1.08e- 1       151  -0.283     0.0287 Pearson's p~ two.sided  
15 Month   Day     -0.00796   -0.0978 9.22e- 1       151  -0.166     0.151  Pearson's p~ two.sided  

现在您可以使用 dplyr::filter 对您想要的结果进行子集化。

参考书目

FAY,科林。 2017. “一件疯狂的小事叫 purrr - 第 6 部分:做统计。” https://colinfay.me/purrr-statistics/.