如果条件连接不执行,如何从其他列添加值?
How to add values from other column if conditional join does not execute?
我有两个table这个是旧名字
Last Name|First Name|ID
Clay Cassius 1
Alcindor Lou 2
Artest Ron 3
Jordan Michael 4
Scottie Pippen 5
Kanter Enes 6
新名称
Last Name| First Name| ID
Ali Muhammad 1
Abdul Jabbar Kareem 2
World Peace Metta 3
Jordan Michael 4
Pippen Scottie 5
Freedom Enes Kanter 6
基本上我想加入第一个 table(旧名字),如果名字发生变化,它将显示新的姓氏,否则为空白
Last Name|First Name|ID|Discrepancies
Clay Cassius 1 Ali
Alcindor Lou 2 Abdul Jabbar
Artest Ron 3 World Peace
Jordan Michael 4
Pippen Scottie 5
Kanter Enes 6 Freedom
请注意,Michael 和 Scottie 的名字没有改变,因此在差异中有一个空白。
你可以使用
library(dplyr)
df1 %>%
left_join(df2, by = "ID", suffix = c("", ".y")) %>%
mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>%
select(-ends_with(".y"))
得到
# A tibble: 6 x 4
Last_Name First_Name ID Discrepancies
<chr> <chr> <dbl> <chr>
1 Clay Cassius 1 "Ali"
2 Alcindor Lou 2 "Abdul Jabbar"
3 Artest Ron 3 "World Peace"
4 Jordan Michael 4 ""
5 Scottie Pippen 5 "Pippen"
6 Kanter Enes 6 "Freedom"
注:
- 我将列命名为
Last_Name
和 First_Name
。
- 第一个数据框包含
Scottie Pippen
而不是 Pippen Scottie
。
另一个可能的解决方案:
library(tidyverse)
old <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Last = c("Clay",
"Alcindor","Artest","Jordan","Scottie","Kanter"),
`First` = c("Cassius","Lou",
"Ron","Michael","Pippen","Enes"),
`ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)
new <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Last` = c("Ali",
"Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
`First` = c("Muhammad",
"Kareem","Metta","Michael","Scottie","Enes Kanter"),
ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)
old %>%
bind_rows(new) %>%
group_by(ID) %>%
summarise(
discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_),
Last = first(Last), First = first(First), .groups = "drop" )
#> # A tibble: 6 × 4
#> ID discrepancies Last First
#> <int> <chr> <chr> <chr>
#> 1 1 Ali Clay Cassius
#> 2 2 Abdul Jabbar Alcindor Lou
#> 3 3 World Peace Artest Ron
#> 4 4 <NA> Jordan Michael
#> 5 5 Pippen Scottie Pippen
#> 6 6 Freedom Kanter Enes
您可以简单地 merge
您的数据,然后过滤重复项。
dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies") )
dfinal$Discrepancies[ dfinal$Last.Name == dfinal$Discrepancies ] <- ""
dfinal
Last.Name First.Name ID Discrepancies
1 Clay Cassius 1 Ali
2 Alcindor Lou 2 Abdul Jabbar
3 Artest Ron 3 World Peace
4 Jordan Michael 4
5 Scottie Pippen 5 Pippen
6 Kanter Enes 6 Freedom
数据
dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))
dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))
我有两个table这个是旧名字
Last Name|First Name|ID
Clay Cassius 1
Alcindor Lou 2
Artest Ron 3
Jordan Michael 4
Scottie Pippen 5
Kanter Enes 6
新名称
Last Name| First Name| ID
Ali Muhammad 1
Abdul Jabbar Kareem 2
World Peace Metta 3
Jordan Michael 4
Pippen Scottie 5
Freedom Enes Kanter 6
基本上我想加入第一个 table(旧名字),如果名字发生变化,它将显示新的姓氏,否则为空白
Last Name|First Name|ID|Discrepancies
Clay Cassius 1 Ali
Alcindor Lou 2 Abdul Jabbar
Artest Ron 3 World Peace
Jordan Michael 4
Pippen Scottie 5
Kanter Enes 6 Freedom
请注意,Michael 和 Scottie 的名字没有改变,因此在差异中有一个空白。
你可以使用
library(dplyr)
df1 %>%
left_join(df2, by = "ID", suffix = c("", ".y")) %>%
mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>%
select(-ends_with(".y"))
得到
# A tibble: 6 x 4
Last_Name First_Name ID Discrepancies
<chr> <chr> <dbl> <chr>
1 Clay Cassius 1 "Ali"
2 Alcindor Lou 2 "Abdul Jabbar"
3 Artest Ron 3 "World Peace"
4 Jordan Michael 4 ""
5 Scottie Pippen 5 "Pippen"
6 Kanter Enes 6 "Freedom"
注:
- 我将列命名为
Last_Name
和First_Name
。 - 第一个数据框包含
Scottie Pippen
而不是Pippen Scottie
。
另一个可能的解决方案:
library(tidyverse)
old <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Last = c("Clay",
"Alcindor","Artest","Jordan","Scottie","Kanter"),
`First` = c("Cassius","Lou",
"Ron","Michael","Pippen","Enes"),
`ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)
new <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Last` = c("Ali",
"Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
`First` = c("Muhammad",
"Kareem","Metta","Michael","Scottie","Enes Kanter"),
ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)
old %>%
bind_rows(new) %>%
group_by(ID) %>%
summarise(
discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_),
Last = first(Last), First = first(First), .groups = "drop" )
#> # A tibble: 6 × 4
#> ID discrepancies Last First
#> <int> <chr> <chr> <chr>
#> 1 1 Ali Clay Cassius
#> 2 2 Abdul Jabbar Alcindor Lou
#> 3 3 World Peace Artest Ron
#> 4 4 <NA> Jordan Michael
#> 5 5 Pippen Scottie Pippen
#> 6 6 Freedom Kanter Enes
您可以简单地 merge
您的数据,然后过滤重复项。
dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies") )
dfinal$Discrepancies[ dfinal$Last.Name == dfinal$Discrepancies ] <- ""
dfinal
Last.Name First.Name ID Discrepancies
1 Clay Cassius 1 Ali
2 Alcindor Lou 2 Abdul Jabbar
3 Artest Ron 3 World Peace
4 Jordan Michael 4
5 Scottie Pippen 5 Pippen
6 Kanter Enes 6 Freedom
数据
dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))
dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))