在 R 中，有条件地在另一个数据框中查找变量

Question

背景

我在 r 中有两个 df，一个叫 d，另一个叫 insurance。 d 包含唯一 ID 的列表以及两个属性 gender 和 zip（与 this 问题无关）。 insurance 包含一些相同的 ID 数字，但它们不是唯一的。它们代表了特定年份的人们的健康保险计划。 insurance table 中的一些 ID 重复，因为那个人在那一年有不止一个保险计划（即他们更换了保险公司）。下面是制作这些 table 的一些代码：

d <- data.frame(ID = c("a","b","c","d","e","f"), 
                    gender = c("f","f","m","f","m","m"), 
                    zip = c(48601,60107,29910,54220,28173,44663),stringsAsFactors=FALSE)

看起来像：

ID	gender	zip
a	f	48601
b	f	60107
c	m	29910
d	f	54220
e	m	28173
f	m	44663

insurance <- data.frame(ID = c("a","a","c","d","f","f"), 
                     ins_type = c("public","private","private","private","private","public"), 
                     insurer = c("medicare","aetna","cigna","uhc","uhc","medicaid"),
                     stringsAsFactors = FALSE)

看起来像：

ID	ins_type	insurer
a	public	medicare
a	private	aetna
c	private	cigna
d	private	uhc
f	private	uhc
f	public	medicaid

这是我的目标：

我需要 d 来反映 d$ID 中是否有人拥有任何 public 保险，如果有，是哪家保险公司。具体来说，这意味着“查找”insurance table 并在 d 中创建 2 个新变量：首先，一个 1/0 或 yes/no 变量永远具有 public保险（称这个变量为d$public）；第二，它是哪家保险公司（称之为d$insurer）。

棘手的一点是我需要 d$ID 来保持唯一性，因为它必须是我未在此处概述的项目的另一个方面的主键。所以这个想法是，如果任何 ID 有 any 条目用于 public 保险，那么 d$public 应该得到“1”或“是”或随便。

想要的结果

我想要一个看起来像这样的 table:

ID	gender	zip	public	insurer
a	f	48601	1	medicare
b	f	60107	0	NA
c	m	29910	0	NA
d	f	54220	0	NA
e	m	28173	0	NA
f	m	44663	1	medicaid

我试过的

此问题的多个版本之前已被问到（例如 , here），但我不太答对。

我试过使用连接，像这样：

d2 <- d %>%
  left_join(insurance, by=c("ID"="ID"))

这为我提供了我想要的列，但它使像 a 这样的 ID 重复，这是我不能拥有的。

感谢您的帮助！

Answer 1


d %>% 
      left_join(insurance %>% filter(ins_type == "public"), by = "ID") %>%
      mutate(public = ! is.na(ins_type)) %>%
      select(-ins_type)

Answer 2

这是一个使用 base R 的解决方案。无需安装任何与 tidyverse 相关的包。

# Define public column based on insurance$ins_type
df$public <- ifelse(insurance$ins_type == "public", 1, 0)

# Now we'll define df$insurer as the result of applying a function
# that receives the ids and df$public as arguments.

# if the public type is not 1, return NA, else return the insurer name
# in the insurance df where type is 1 and id = id

df$insurer <- mapply(
    function(id, type) {
    if(type != 1)
        return(NA)
    return(insurance$insurer[insurance$id == id & insurance$type == type])
    }
    df$id,
    df$public
)

在 R 中，有条件地在另一个数据框中查找变量

In R, conditionally look up a variable in another dataframe

lookup

merge

r

dplyr