如何使用来自另一个数据框的信息更新数据框列

How to update dataframe column using information from another dataframe

我有 2 个数据帧:

df1 = data.frame(Bird_ID = c(1:6), Sex = c("Male","Female","Male","Male","Male","UNK"))
df2 = data.frame(Bird_ID = c(6), Seen_sex = c("Female"))

df1
    # Bird_ID Sex
    # 1 Male
    # 2 Female
    # 3 Male
    # 4 Male
    # 5 Male
    # 6 UNK
    
    df2
    # Bird_ID Seen_Sex
    # 6 Female

如何使用 df2 中的信息更新 df1 中的鸟 6?所以 df1 中的“UNK”现在应该变成“雌性”,所​​有其他鸟类保持不变。

我个人更喜欢保留内容并像这样合并列

library(dplyr)
left_join(df1, df2, by= "Bird_ID") %>%
  mutate(
    Sex = coalesce(Seen_sex,  Sex)
  ) %>%
  select(-Seen_sex)

但是您可以通过查找行并覆盖它来更新特定记录。

df1[df2$Bird_ID == df1$Bird_ID,] = df2 

使用dplyr:

library(dplyr)
df1 %>%
  left_join(., df2) %>%
  mutate(Sex = ifelse(!is.na(Seen_sex), Seen_sex, Sex)) %>%
  select(-Seen_sex)
Joining, by = "Bird_ID"
  Bird_ID    Sex
1       1   Male
2       2 Female
3       3   Male
4       4   Male
5       5   Male
6       6 Female

base R中:

df1 <- merge(df1, df2, by = "Bird_ID", all = TRUE)
df1$Sex[!is.na(df1$Seen_sex)] <- df1$Seen_sex[!is.na(df1$Seen_sex)]
df1$Seen_sex <- NULL

使用 dplyr >= 0.5 版本:

> merge(df1, setNames(df2, c('Bird_ID', 'Sex')), on='Bird_ID', all=T) %>% distinct(Bird_ID, .keep_all=T)
  Bird_ID    Sex
1       1   Male
2       2 Female
3       3   Male
4       4   Male
5       5   Male
6       6 Female
> 

您可以在基数 R 中使用 match -

df1$Sex[match(df2$Bird_ID, df1$Bird_ID)] <- df2$Seen_sex
df1

#  Bird_ID    Sex
#1       1   Male
#2       2 Female
#3       3   Male
#4       4   Male
#5       5   Male
#6       6 Female