如果条件连接不执行,如何从其他列添加值?

How to add values from other column if conditional join does not execute?

我有两个table这个是旧名字

Last Name|First Name|ID
Clay      Cassius    1
Alcindor  Lou        2
Artest    Ron        3
Jordan    Michael    4
Scottie   Pippen     5
Kanter    Enes       6

新名称

Last Name|   First Name|   ID
Ali          Muhammad       1
Abdul Jabbar Kareem         2
World Peace  Metta          3
Jordan       Michael        4
Pippen       Scottie        5
Freedom      Enes Kanter    6

基本上我想加入第一个 table(旧名字),如果名字发生变化,它将显示新的姓氏,否则为空白

Last Name|First Name|ID|Discrepancies
Clay      Cassius    1  Ali
Alcindor  Lou        2  Abdul Jabbar
Artest    Ron        3  World Peace
Jordan    Michael    4  
Pippen   Scottie     5  
Kanter    Enes       6  Freedom

请注意,Michael 和 Scottie 的名字没有改变,因此在差异中有一个空白。

你可以使用

library(dplyr)

df1 %>% 
  left_join(df2, by = "ID", suffix = c("", ".y")) %>% 
  mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>% 
  select(-ends_with(".y"))

得到

# A tibble: 6 x 4
  Last_Name First_Name    ID Discrepancies 
  <chr>     <chr>      <dbl> <chr>         
1 Clay      Cassius        1 "Ali"         
2 Alcindor  Lou            2 "Abdul Jabbar"
3 Artest    Ron            3 "World Peace" 
4 Jordan    Michael        4 ""            
5 Scottie   Pippen         5 "Pippen"      
6 Kanter    Enes           6 "Freedom" 

注:

  • 我将列命名为 Last_NameFirst_Name
  • 第一个数据框包含 Scottie Pippen 而不是 Pippen Scottie

另一个可能的解决方案:

library(tidyverse)

old <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Last = c("Clay",
           "Alcindor","Artest","Jordan","Scottie","Kanter"),
  `First` = c("Cassius","Lou",
              "Ron","Michael","Pippen","Enes"),
  `ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)

new <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  `Last` = c("Ali",
             "Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
  `First` = c("Muhammad",
              "Kareem","Metta","Michael","Scottie","Enes Kanter"),
  ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)

old %>% 
  bind_rows(new) %>% 
  group_by(ID) %>% 
  summarise(
    discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_), 
    Last = first(Last), First = first(First), .groups = "drop" )

#> # A tibble: 6 × 4
#>      ID discrepancies Last     First  
#>   <int> <chr>         <chr>    <chr>  
#> 1     1 Ali           Clay     Cassius
#> 2     2 Abdul Jabbar  Alcindor Lou    
#> 3     3 World Peace   Artest   Ron    
#> 4     4 <NA>          Jordan   Michael
#> 5     5 Pippen        Scottie  Pippen 
#> 6     6 Freedom       Kanter   Enes

您可以简单地 merge 您的数据,然后过滤重复项。

dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
  ,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies")  )

dfinal$Discrepancies[ dfinal$Last.Name == dfinal$Discrepancies ] <- ""

dfinal
  Last.Name First.Name ID Discrepancies
1      Clay    Cassius  1           Ali
2  Alcindor        Lou  2  Abdul Jabbar
3    Artest        Ron  3   World Peace
4    Jordan    Michael  4
5   Scottie     Pippen  5        Pippen
6    Kanter       Enes  6       Freedom

数据

dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))

dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))