r 通过 ID 将值从一个数据集传输到另一个数据集

Question

我有两个数据集，第一个数据集是这样的

   ID     Weight     State
   1      12.34      NA
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      NA
   5      14.12      NA

第二个数据集是通过 ID

查找 table 状态值

   ID    State
   1     WY
   2     IA
   3     MA
   4     OR
   4     CA
   5     FL

如您所见，ID 4 有两个不同的状态值，这是正常的。

我想要做的是用数据集 2 中的状态值替换数据集 1 状态列中的 NA。预期数据集

  ID     Weight     State
   1      12.34      WY
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      OR,CA
   5      14.12      FL

由于 ID 4 在 dataset2 中有两个状态值，这两个值被折叠并用分隔，并用于替换 dataset1 中的 NA。非常感谢任何关于实现这一目标的建议。提前致谢。

Answer 1

折叠 df2 值并通过 'ID' 将其与 df1 合并。使用 coalesce 以使用来自两个状态列的非 NA 值。

library(dplyr)

df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)), by = 'ID') %>%
  mutate(State = coalesce(State.x, State.y)) %>%
  select(-State.x, -State.y)

#  ID Weight  State
#1  1   12.3     WY
#2  2   11.2     IA
#3  2   13.1     IN
#4  3   12.7     MA
#5  4   10.9 OR, CA
#6  5   14.1     FL

在基础 R 中，merge 和 transform。

merge(df1, aggregate(State~ID, df2, toString), by = 'ID') |>
  transform(State = ifelse(is.na(State.x), State.y, State.x))

Answer 2

Tidyverse 方式：

library(tidyverse)
df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)) %>%
              ungroup(), by = 'ID') %>%
  transmute(ID, Weight, State = coalesce(State.x, State.y))

基础 R 备选方案：

na_idx <- which(is.na(df1$State))
df1$State[na_idx] <- with(
  aggregate(State ~ ID, df2, toString),
  State[match(df1$ID, ID)]
)[na_idx]

数据：

df1 <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 5L), Weight = c(12.34, 
11.23, 13.12, 12.67, 10.89, 14.12), State = c("WY", "IA", "IN", 
"MA", "OR, CA", "FL")), row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 5L), State = c("WY", 
"IA", "MA", "OR", "CA", "FL")), class = "data.frame", row.names = c(NA, 
-6L))

r 通过 ID 将值从一个数据集传输到另一个数据集

r transfer values from one dataset to another by ID

replace

r

transfer

missing-data

dplyr