如何使用连接到主要 ID 列的数据创建基于次要 ID 列的新列
How To Create New Column Based Off of Secondary ID Column Utilizing Data Connected to Primary ID Column
这个解释起来有点复杂。我有一组 speed dating data off of Kaggle 并且有一列 Subject_IDs
和 Partner_IDs
(我如何重命名数据)。种族和性别等列与 Subject_IDs
相关联,但每个主题也是数据集中的合作伙伴。我想根据我重命名为 Subject_Gender
和 Subject_Race
.
的列创建 Partner_Race 和 Partner_Gender 的列
编辑:澄清一下,Partner_IDs
是 Subject_IDs
中的同一个人,使用相同的 ID 号。它们只是放在不同的列中。
我真的不知道我什至需要采取的逻辑步骤来做到这一点。当然,我的数据不仅仅是六个观察值,否则我只能手动完成。我更喜欢 dplyr 或 plyr 方法,但如果那不可能,那没关系
我的数据如下:
Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male
这就是我想要创建的内容
Subject_ID Partner_ID Subject_Race Subject_Gender **Partner_Race Partner Gender**
1 6 Caucasian Female NA Male
2 5 Asian Male Latino Male
3 4 African_American Female Other Female
4 3 Other Female African_American Female
5 2 Latino Male Asian Male
6 1 NA Male Caucasian Female
我仍然处于数据清理和论证的最基础阶段。这超出了我的理解范围
您可以单独连接数据以及 Partner_ID
和 Subject_ID
列。
df <- read.table(text = "Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male", header = T)
library(tidyverse)
df %>%
dplyr::left_join(df, by = c("Subject_ID" = "Partner_ID"),
suffix = c("", "_Partner")) %>%
dplyr::select(-Subject_ID_Partner,
Partner_Gender = Subject_Gender_Partner,
Partner_Race = Subject_Race_Partner)
输出:
Subject_ID Partner_ID Subject_Race Subject_Gender Partner_Race Partner_Gender
1 1 6 Caucasian Female <NA> Male
2 2 5 Asian Male Latin Male
3 3 4 African_American Female Other Female
4 4 3 Other Female African_American Female
5 5 2 Latin Male Asian Male
6 6 1 <NA> Male Caucasian Female
>
这个解释起来有点复杂。我有一组 speed dating data off of Kaggle 并且有一列 Subject_IDs
和 Partner_IDs
(我如何重命名数据)。种族和性别等列与 Subject_IDs
相关联,但每个主题也是数据集中的合作伙伴。我想根据我重命名为 Subject_Gender
和 Subject_Race
.
编辑:澄清一下,Partner_IDs
是 Subject_IDs
中的同一个人,使用相同的 ID 号。它们只是放在不同的列中。
我真的不知道我什至需要采取的逻辑步骤来做到这一点。当然,我的数据不仅仅是六个观察值,否则我只能手动完成。我更喜欢 dplyr 或 plyr 方法,但如果那不可能,那没关系
我的数据如下:
Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male
这就是我想要创建的内容
Subject_ID Partner_ID Subject_Race Subject_Gender **Partner_Race Partner Gender**
1 6 Caucasian Female NA Male
2 5 Asian Male Latino Male
3 4 African_American Female Other Female
4 3 Other Female African_American Female
5 2 Latino Male Asian Male
6 1 NA Male Caucasian Female
我仍然处于数据清理和论证的最基础阶段。这超出了我的理解范围
您可以单独连接数据以及 Partner_ID
和 Subject_ID
列。
df <- read.table(text = "Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male", header = T)
library(tidyverse)
df %>%
dplyr::left_join(df, by = c("Subject_ID" = "Partner_ID"),
suffix = c("", "_Partner")) %>%
dplyr::select(-Subject_ID_Partner,
Partner_Gender = Subject_Gender_Partner,
Partner_Race = Subject_Race_Partner)
输出:
Subject_ID Partner_ID Subject_Race Subject_Gender Partner_Race Partner_Gender
1 1 6 Caucasian Female <NA> Male
2 2 5 Asian Male Latin Male
3 3 4 African_American Female Other Female
4 4 3 Other Female African_American Female
5 5 2 Latin Male Asian Male
6 6 1 <NA> Male Caucasian Female
>