有没有办法同时将两个数据集左连接到第三个数据集?
Is there a way to left join two datasets into a third one at the same time?
我有一个看起来像这样的数据集 (df1)
ID |New York Athletes
Base001 Aaron Judge
Bask001 Kevin Durant
Bask002 Julius Randle
Base002 Max Scherzer
我想在不添加额外列的情况下同时合并其他两个数据集
ID |TEAM
Bask001 Nets
Bask002 Knicks
ID |TEAM
Base001 Yankees
Base002 Mets
df1<- df1 %>%
mutate(merge(df1,Base,by="ID",all.x = TRUE))%>%
mutate(merge(.,Base,by="ID",all.x = TRUE))
但是当我这样做时我得到
ID |New York Athletes|Teams |Teams.x|Team.y
Base001 Aaron Judge Yankees Yankees
Bask001 Kevin Durant Nets
Bask002 Julius Randle Knicks
Base002 Max Scherzer Mets Mets
我想要这样的东西
ID |New York Athletes|Teams
Base001 Aaron Judge Yankees
Bask001 Kevin Durant Nets
Bask002 Julius Randle Knicks
Base002 Max Scherzer Mets
编辑后的答案:
你实际上需要为此做两个单独的连接。如果您想一次加入多个数据帧,请先尝试,以供将来参考。
library(tidyverse)
df1 <- tribble(~ID, ~`New York Athletes`,
"Base001", "Aaron Judge",
"Bask001", "Kevin Durant",
"Bask002", "Julius Randle",
"Base002", "Max Scherzer")
df2 <- tribble(~ID,~TEAM,
"Bask001", "Nets",
"Bask002", "Knicks")
df3 <- tribble(~ID, ~TEAM,
"Base001", "Yankees",
"Base002", "Mets")
df1_1 <- full_join(df2, df3, by = c("ID", "TEAM"))
final_df <- left_join(df1, df1_1, by = "ID"); final_df
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Bask001 Kevin Durant Nets
#> 3 Bask002 Julius Randle Knicks
#> 4 Base002 Max Scherzer Mets
由 reprex package (v2.0.1)
于 2022-04-15 创建
是的,您可以使用 purrr
函数一次进行多个连接:
library(tidyverse)
df1 <- tibble(x = c("A1", "A2", "B1", "B2"), y = c(1, 2, 3, 4))
df2 <- tibble(x = c("A1", "A2", "B1", "B2"), z = c(4, 5, 6, 7))
df3 <- tibble(x = c("A1", "A2", "B1", "B2"), delta = c(8, 9, 10, 11))
list_of_dataframes <- list(df1, df2, df3)
purrr::reduce(list_of_dataframes, left_join, by = "x")
#> # A tibble: 4 × 4
#> x y z delta
#> <chr> <dbl> <dbl> <dbl>
#> 1 A1 1 4 8
#> 2 A2 2 5 9
#> 3 B1 3 6 10
#> 4 B2 4 7 11
由 reprex package (v2.0.1)
于 2022-04-15 创建
在基地 R
merge(df1, rbind(df2, df3))
ID New.York.Athletes TEAM
1 Base001 Aaron Judge Yankees
2 Base002 Max Scherzer Mets
3 Bask001 Kevin Durant Nets
4 Bask002 Julius Randle Knicks
我认为@KU99 的答案可能是最简单的,但这是另一个使用 coalesce
.
的选项
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- tibble::tribble(
~ID, ~`New York Athletes`,
"Base001", "Aaron Judge",
"Bask001", "Kevin Durant",
"Bask002", "Julius Randle",
"Base002", "Max Scherzer")
df2 <- tibble::tibble(ID = c("Bask001", "Bask002"),
TEAM = c("Nets", "Knicks"))
df3 <- tibble::tibble(ID = c("Base001", "Base002"),
TEAM = c("Yankees", "Mets"))
df1 <- df1 %>%
mutate(merge(.,df2,by="ID",all.x = TRUE))%>%
mutate(merge(.,df3,by="ID",all.x = TRUE)) %>%
select(-TEAM) %>%
mutate(TEAM = coalesce(TEAM.x,TEAM.y )) %>%
select(-c(TEAM.x, TEAM.y))
df1
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Base002 Max Scherzer Mets
#> 3 Bask001 Kevin Durant Nets
#> 4 Bask002 Julius Randle Knicks
由 reprex package (v2.0.1)
于 2022-04-15 创建
您可以使用{powerjoin}
library(powerjoin)
power_left_join(df1, list(df2, df3), by = "ID", conflict = coalesce_xy)
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Bask001 Kevin Durant Nets
#> 3 Bask002 Julius Randle Knicks
#> 4 Base002 Max Scherzer Mets
由 reprex package (v2.0.1)
创建于 2022-04-16
我有一个看起来像这样的数据集 (df1)
ID |New York Athletes
Base001 Aaron Judge
Bask001 Kevin Durant
Bask002 Julius Randle
Base002 Max Scherzer
我想在不添加额外列的情况下同时合并其他两个数据集
ID |TEAM
Bask001 Nets
Bask002 Knicks
ID |TEAM
Base001 Yankees
Base002 Mets
df1<- df1 %>%
mutate(merge(df1,Base,by="ID",all.x = TRUE))%>%
mutate(merge(.,Base,by="ID",all.x = TRUE))
但是当我这样做时我得到
ID |New York Athletes|Teams |Teams.x|Team.y
Base001 Aaron Judge Yankees Yankees
Bask001 Kevin Durant Nets
Bask002 Julius Randle Knicks
Base002 Max Scherzer Mets Mets
我想要这样的东西
ID |New York Athletes|Teams
Base001 Aaron Judge Yankees
Bask001 Kevin Durant Nets
Bask002 Julius Randle Knicks
Base002 Max Scherzer Mets
编辑后的答案: 你实际上需要为此做两个单独的连接。如果您想一次加入多个数据帧,请先尝试,以供将来参考。
library(tidyverse)
df1 <- tribble(~ID, ~`New York Athletes`,
"Base001", "Aaron Judge",
"Bask001", "Kevin Durant",
"Bask002", "Julius Randle",
"Base002", "Max Scherzer")
df2 <- tribble(~ID,~TEAM,
"Bask001", "Nets",
"Bask002", "Knicks")
df3 <- tribble(~ID, ~TEAM,
"Base001", "Yankees",
"Base002", "Mets")
df1_1 <- full_join(df2, df3, by = c("ID", "TEAM"))
final_df <- left_join(df1, df1_1, by = "ID"); final_df
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Bask001 Kevin Durant Nets
#> 3 Bask002 Julius Randle Knicks
#> 4 Base002 Max Scherzer Mets
由 reprex package (v2.0.1)
于 2022-04-15 创建是的,您可以使用 purrr
函数一次进行多个连接:
library(tidyverse)
df1 <- tibble(x = c("A1", "A2", "B1", "B2"), y = c(1, 2, 3, 4))
df2 <- tibble(x = c("A1", "A2", "B1", "B2"), z = c(4, 5, 6, 7))
df3 <- tibble(x = c("A1", "A2", "B1", "B2"), delta = c(8, 9, 10, 11))
list_of_dataframes <- list(df1, df2, df3)
purrr::reduce(list_of_dataframes, left_join, by = "x")
#> # A tibble: 4 × 4
#> x y z delta
#> <chr> <dbl> <dbl> <dbl>
#> 1 A1 1 4 8
#> 2 A2 2 5 9
#> 3 B1 3 6 10
#> 4 B2 4 7 11
由 reprex package (v2.0.1)
于 2022-04-15 创建在基地 R
merge(df1, rbind(df2, df3))
ID New.York.Athletes TEAM
1 Base001 Aaron Judge Yankees
2 Base002 Max Scherzer Mets
3 Bask001 Kevin Durant Nets
4 Bask002 Julius Randle Knicks
我认为@KU99 的答案可能是最简单的,但这是另一个使用 coalesce
.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- tibble::tribble(
~ID, ~`New York Athletes`,
"Base001", "Aaron Judge",
"Bask001", "Kevin Durant",
"Bask002", "Julius Randle",
"Base002", "Max Scherzer")
df2 <- tibble::tibble(ID = c("Bask001", "Bask002"),
TEAM = c("Nets", "Knicks"))
df3 <- tibble::tibble(ID = c("Base001", "Base002"),
TEAM = c("Yankees", "Mets"))
df1 <- df1 %>%
mutate(merge(.,df2,by="ID",all.x = TRUE))%>%
mutate(merge(.,df3,by="ID",all.x = TRUE)) %>%
select(-TEAM) %>%
mutate(TEAM = coalesce(TEAM.x,TEAM.y )) %>%
select(-c(TEAM.x, TEAM.y))
df1
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Base002 Max Scherzer Mets
#> 3 Bask001 Kevin Durant Nets
#> 4 Bask002 Julius Randle Knicks
由 reprex package (v2.0.1)
于 2022-04-15 创建您可以使用{powerjoin}
library(powerjoin)
power_left_join(df1, list(df2, df3), by = "ID", conflict = coalesce_xy)
#> # A tibble: 4 × 3
#> ID `New York Athletes` TEAM
#> <chr> <chr> <chr>
#> 1 Base001 Aaron Judge Yankees
#> 2 Bask001 Kevin Durant Nets
#> 3 Bask002 Julius Randle Knicks
#> 4 Base002 Max Scherzer Mets
由 reprex package (v2.0.1)
创建于 2022-04-16