计算 R 中两个数据帧之间的分组成对相关性

Calculate grouped pairwise correlation between two dataframe in R

我有两个结构相同的数据框:

df1 <- data.frame(group1=c("A","A","A","B","B","C","C","C"), 
group2 = c(1,1,2,1,1,2,2,1), 
col1 = c(1,2,3,4,5,6,7,8), 
col2 = c(3,5,7,4,3,7,2,7))

df2 <- data.frame(group1=c("A","A","A","B","B","C","C","C"), 
group2 = c(1,1,2,1,1,2,2,1), 
col1 = c(6,2,7,5,2,5,7,7), 
col2 = c(7,2,5,21,6,9,4,2))

两个数据框中的前两列相同。我想计算具有相同名称的列之间的相关性(即 df1 中的 col1 和 df2 中的 col1 之间的相关性)。

预期结果:

group1 group2 col1 correlation col2 correlation
A 1 0.1 0.5
A 2 0.05 0.04
B 1 0.46 0.2

下面的代码应该可以完成工作。然而,由于实际数据框中要关联的列不止两列。键入所有这些列名称非常痛苦。有什么聪明的方法可以做到这一点吗?提前致谢!

df <- data.frame(df1,df2) %>% group_by(df1.group1,df2.group2) 
%>% mutate(col1_cor = cor(df1.col1,df2.col1), col2_cor = cor(df1.col2,df2.col2)) %>% 
select(df1.group1,df1.group2,col1_cor,col2_cor)

您可以使用 id 变量行绑定两个数据帧以区分它们并计算每个 col 列的相关性。

library(dplyr)

bind_rows(df1, df2, .id = 'id') %>%
  group_by(group1, group2) %>%
  summarise(across(starts_with('col'), 
       ~cor(.x[id == 1], .x[id == 2]), .names = '{col}_cor'), .groups = 'drop')