如何使用连接到主要 ID 列的数据创建基于次要 ID 列的新列

How To Create New Column Based Off of Secondary ID Column Utilizing Data Connected to Primary ID Column

这个解释起来有点复杂。我有一组 speed dating data off of Kaggle 并且有一列 Subject_IDsPartner_IDs (我如何重命名数据)。种族和性别等列与 Subject_IDs 相关联,但每个主题也是数据集中的合作伙伴。我想根据我重命名为 Subject_GenderSubject_Race.

的列创建 Partner_Race 和 Partner_Gender 的列

编辑:澄清一下,Partner_IDsSubject_IDs 中的同一个人,使用相同的 ID 号。它们只是放在不同的列中。

我真的不知道我什至需要采取的逻辑步骤来做到这一点。当然,我的数据不仅仅是六个观察值,否则我只能手动完成。我更喜欢 dplyr 或 plyr 方法,但如果那不可能,那没关系

我的数据如下:

Subject_ID     Partner_ID     Subject_Race      Subject_Gender
   1               6            Caucasian          Female
   2               5             Asian              Male
   3               4         African_American      Female
   4               3             Other             Female
   5               2             Latin              Male
   6               1               NA               Male

这就是我想要创建的内容

Subject_ID     Partner_ID     Subject_Race      Subject_Gender      **Partner_Race     Partner Gender**
   1               6            Caucasian          Female                NA               Male
   2               5             Asian              Male               Latino             Male
   3               4         African_American      Female               Other            Female
   4               3             Other             Female          African_American      Female
   5               2             Latino             Male                Asian             Male
   6               1               NA               Male              Caucasian          Female

我仍然处于数据清理和论证的最基础阶段。这超出了我的理解范围

您可以单独连接数据以及 Partner_IDSubject_ID 列。

df <- read.table(text = "Subject_ID     Partner_ID     Subject_Race      Subject_Gender
   1               6            Caucasian          Female
                 2               5             Asian              Male
                 3               4         African_American      Female
                 4               3             Other             Female
                 5               2             Latin              Male
                 6               1               NA               Male", header = T)


library(tidyverse)

df %>%
  dplyr::left_join(df, by = c("Subject_ID" = "Partner_ID"),
            suffix = c("", "_Partner")) %>%
  dplyr::select(-Subject_ID_Partner, 
         Partner_Gender = Subject_Gender_Partner,
         Partner_Race = Subject_Race_Partner)

输出:

  Subject_ID Partner_ID     Subject_Race Subject_Gender     Partner_Race Partner_Gender
1          1          6        Caucasian         Female             <NA>           Male
2          2          5            Asian           Male            Latin           Male
3          3          4 African_American         Female            Other         Female
4          4          3            Other         Female African_American         Female
5          5          2            Latin           Male            Asian           Male
6          6          1             <NA>           Male        Caucasian         Female
>