在 r 中总结两个数据帧
Summarize two dataframes in r
我有两个数据框
df1
# var1 var2
# 1 X01 Red
# 2 X02 Green
# 3 X03 Red
# 4 X04 Yellow
# 5 X05 Red
# 6 X06 Green
df2
# X01 X02 X03 ...
# 1 1 0.1 2.1
# 2 2 0.2 2.2
# 3 3 0.3 2.3
# 4 4 0.4 2.4
# 5 5 0.5 2.5
# 6 6 0.6 2.6
我想得到这样的东西
mean green val1
mean red val2
mean yellow val3
当用相应的变量计算平均值时。
我们可以更改 'df2' 中的列名与 'df1' 中的键值对的匹配,以替换颜色值,rep
通过 col
um 索引, unlist
'df2' 并使用分组方法得到 mean
tapply(unlist(df2), setNames(df1$var2, df1$var1)[names(df2)][col(df2)],
FUN = mean, na.rm = TRUE)
或使用 tidyverse
,重塑为 'long' 并在执行 group_by
mean
之前进行连接
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = everything(), names_to = 'var1') %>%
left_join(df1) %>%
group_by(var2) %>%
summarise(value = mean(value, na.rm = TRUE), .groups = 'drop')
数据
df1 <- structure(list(var1 = c("X01", "X02", "X03", "X04", "X05", "X06"
), var2 = c("Red", "Green", "Red", "Yellow", "Red", "Green")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
df2 <- structure(list(X01 = 1:6, X02 = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6
), X03 = c(2.1, 2.2, 2.3, 2.4, 2.5, 2.6)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
我有两个数据框
df1
# var1 var2
# 1 X01 Red
# 2 X02 Green
# 3 X03 Red
# 4 X04 Yellow
# 5 X05 Red
# 6 X06 Green
df2
# X01 X02 X03 ...
# 1 1 0.1 2.1
# 2 2 0.2 2.2
# 3 3 0.3 2.3
# 4 4 0.4 2.4
# 5 5 0.5 2.5
# 6 6 0.6 2.6
我想得到这样的东西
mean green val1
mean red val2
mean yellow val3
当用相应的变量计算平均值时。
我们可以更改 'df2' 中的列名与 'df1' 中的键值对的匹配,以替换颜色值,rep
通过 col
um 索引, unlist
'df2' 并使用分组方法得到 mean
tapply(unlist(df2), setNames(df1$var2, df1$var1)[names(df2)][col(df2)],
FUN = mean, na.rm = TRUE)
或使用 tidyverse
,重塑为 'long' 并在执行 group_by
mean
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = everything(), names_to = 'var1') %>%
left_join(df1) %>%
group_by(var2) %>%
summarise(value = mean(value, na.rm = TRUE), .groups = 'drop')
数据
df1 <- structure(list(var1 = c("X01", "X02", "X03", "X04", "X05", "X06"
), var2 = c("Red", "Green", "Red", "Yellow", "Red", "Green")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
df2 <- structure(list(X01 = 1:6, X02 = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6
), X03 = c(2.1, 2.2, 2.3, 2.4, 2.5, 2.6)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))