根据2个条件合并两个表并输出平均值作为结果列
Merge two tables based on 2 conditions and output the average as result column
我有以下两个 table:
Table_1
ID Interval
1 10
1 11
2 11
和
Table_2
ID Interval Rating
1 10 0.5
1 10 0.3
1 11 0.1
2 11 0.1
2 11 0.2
输出 table 应如下所示:
ID Interval Mean Ratings
1 10 0.4
1 11 0.1
2 11 0.15
我的目标是根据两个 conditions/columns ID 和间隔加入两个 table。鉴于我对相同的 ID 和间隔有多个评分,我想计算评分的 平均值 。尽管 ID 是唯一的 (~9500),但不同 ID 的间隔重复(如上面的 table 所示)。我当前的方法是使用 2 个参数的 join 函数。如何创建最终的 table,其中 Table_1 和 Table_2 根据条件 ID 和间隔连接,并在结果列中接收平均评分?
left_join(Table_1, Table_2, by = c("ID" = "ID", "Interval" = "Interval"))
您不需要加入。相反,绑定您的表格并使用来自 dplyr 的分组和总结。以下实现了您的要求:
library(dplyr)
table_1 <- data.frame("ID"= c(1,1,2),"Interval"=c (10,11,11),"Rating"= c(NA,NA,NA))
table_2 <- data.frame("ID"= c(1,1,1,2,2),"Interval"= c(10,10,11,11,11),"Rating"= c(0.5,0.3,0.1,0.1,0.2))
df1 <- bind_rows(table_1,table_2) %>% group_by(ID,Interval) %>% summarise("Mean Ratings" = mean(Rating,na.rm = TRUE))
您可以通过 dplyr
的 left_join
、group_by
和 summarise
来实现它。
library(dplyr)
table1 %>%
left_join(table2, by = c("ID", "Interval")) %>%
group_by(ID, Interval) %>%
summarise("Mean Ratings" = mean(Rating))
## A tibble: 3 x 3
## Groups: ID [?]
# ID Interval `Mean Ratings`
# <int> <int> <dbl>
#1 1 10 0.4
#2 1 11 0.1
#3 2 11 0.15
数据
table1 <- read.table(header = T, text="ID Interval
1 10
1 11
2 11")
table2 <- read.table(header = T, text = "ID Interval Rating
1 10 0.5
1 10 0.3
1 11 0.1
2 11 0.1
2 11 0.2")
首先,您需要总结第二个 table DT2
,然后与第一个 table DT1
.
执行右连接
library(data.table)
DT1[DT2[, .(Mean_Rating = mean(Rating)), .(ID, Interval)], on = c(ID = "ID", Interval = "Interval")]
这给出了
ID Interval Mean_Rating
1: 1 10 0.40
2: 1 11 0.10
3: 2 11 0.15
示例数据:
DT1 <- structure(list(ID = c(1L, 1L, 2L), Interval = c(10L, 11L, 11L
)), .Names = c("ID", "Interval"), class = c("data.table", "data.frame"
), row.names = c(NA, -3L))
DT2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Interval = c(10L,
10L, 11L, 11L, 11L), Rating = c(0.5, 0.3, 0.1, 0.1, 0.2)), .Names = c("ID",
"Interval", "Rating"), class = c("data.table", "data.frame"), row.names = c(NA,
-5L))
我有以下两个 table:
Table_1
ID Interval
1 10
1 11
2 11
和
Table_2
ID Interval Rating
1 10 0.5
1 10 0.3
1 11 0.1
2 11 0.1
2 11 0.2
输出 table 应如下所示:
ID Interval Mean Ratings
1 10 0.4
1 11 0.1
2 11 0.15
我的目标是根据两个 conditions/columns ID 和间隔加入两个 table。鉴于我对相同的 ID 和间隔有多个评分,我想计算评分的 平均值 。尽管 ID 是唯一的 (~9500),但不同 ID 的间隔重复(如上面的 table 所示)。我当前的方法是使用 2 个参数的 join 函数。如何创建最终的 table,其中 Table_1 和 Table_2 根据条件 ID 和间隔连接,并在结果列中接收平均评分?
left_join(Table_1, Table_2, by = c("ID" = "ID", "Interval" = "Interval"))
您不需要加入。相反,绑定您的表格并使用来自 dplyr 的分组和总结。以下实现了您的要求:
library(dplyr)
table_1 <- data.frame("ID"= c(1,1,2),"Interval"=c (10,11,11),"Rating"= c(NA,NA,NA))
table_2 <- data.frame("ID"= c(1,1,1,2,2),"Interval"= c(10,10,11,11,11),"Rating"= c(0.5,0.3,0.1,0.1,0.2))
df1 <- bind_rows(table_1,table_2) %>% group_by(ID,Interval) %>% summarise("Mean Ratings" = mean(Rating,na.rm = TRUE))
您可以通过 dplyr
的 left_join
、group_by
和 summarise
来实现它。
library(dplyr)
table1 %>%
left_join(table2, by = c("ID", "Interval")) %>%
group_by(ID, Interval) %>%
summarise("Mean Ratings" = mean(Rating))
## A tibble: 3 x 3
## Groups: ID [?]
# ID Interval `Mean Ratings`
# <int> <int> <dbl>
#1 1 10 0.4
#2 1 11 0.1
#3 2 11 0.15
数据
table1 <- read.table(header = T, text="ID Interval
1 10
1 11
2 11")
table2 <- read.table(header = T, text = "ID Interval Rating
1 10 0.5
1 10 0.3
1 11 0.1
2 11 0.1
2 11 0.2")
首先,您需要总结第二个 table DT2
,然后与第一个 table DT1
.
library(data.table)
DT1[DT2[, .(Mean_Rating = mean(Rating)), .(ID, Interval)], on = c(ID = "ID", Interval = "Interval")]
这给出了
ID Interval Mean_Rating
1: 1 10 0.40
2: 1 11 0.10
3: 2 11 0.15
示例数据:
DT1 <- structure(list(ID = c(1L, 1L, 2L), Interval = c(10L, 11L, 11L
)), .Names = c("ID", "Interval"), class = c("data.table", "data.frame"
), row.names = c(NA, -3L))
DT2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Interval = c(10L,
10L, 11L, 11L, 11L), Rating = c(0.5, 0.3, 0.1, 0.1, 0.2)), .Names = c("ID",
"Interval", "Rating"), class = c("data.table", "data.frame"), row.names = c(NA,
-5L))