Dplyr:根据特定条件创建两列
Dplyr: Create two columns based on specific conditions
在这个数据集 DF 中,我们有 4 个名字和 4 个职业。
DF<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,
"diana", 4, 1, 2, 3,
"shakira", 2, 1, 3, 4,
"armstrong", 3, 4, 1, 2,
"picasso", 1, 3, 1, 4
)
假设单元格值是他们职业的某种衡量标准。因此,例如,戴安娜 (Diana) 的公主单元格值最高(正确),但夏奇拉 (Shakira) 的画家单元格值最高(错误)。
我想创建两个名为“兼容”和“不兼容”的列,程序将为戴安娜选择 4 的值,因为它属于正确的职业公主,并将其分配给“兼容”列和“不兼容” " 取其他 3 个值的平均值。对于Shakira,它会从歌手的正确职业中选择值1,并将其分配给Compatible;对于不兼容,它平均其他值。其他名字也是如此
所以输出将是这样的:
DF1<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,~Compatible,~Incompatible,
"diana", 4, 1, 2, 3, 4, 2,
"shakira", 2, 1, 3, 4, 1, 3,
"armstrong", 3, 4, 1, 2, 1, 3,
"picasso", 1, 3, 1, 4, 4, 1.66
)
这是显示正确姓名和职业的数据集:
DF3<- tribble(
~names, ~professions,
"diana", "princess",
"shakira", "singer",
"armstrong", "astronaut",
"picasso", "painter"
)
DF1[1:5] %>%
pivot_longer(-names) %>%
left_join(DF3, 'names') %>%
group_by(names, name = if_else(name == professions, 'compatible', 'incompatible')) %>%
summarise(profession = first(professions), value = mean(value), .groups = 'drop') %>%
pivot_wider()
# A tibble: 4 x 4
names profession compatible incompatible
<chr> <chr> <dbl> <dbl>
1 armstrong astronaut 1 3
2 diana princess 4 2
3 picasso painter 4 1.67
4 shakira singer 1 3
在这个数据集 DF 中,我们有 4 个名字和 4 个职业。
DF<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,
"diana", 4, 1, 2, 3,
"shakira", 2, 1, 3, 4,
"armstrong", 3, 4, 1, 2,
"picasso", 1, 3, 1, 4
)
假设单元格值是他们职业的某种衡量标准。因此,例如,戴安娜 (Diana) 的公主单元格值最高(正确),但夏奇拉 (Shakira) 的画家单元格值最高(错误)。
我想创建两个名为“兼容”和“不兼容”的列,程序将为戴安娜选择 4 的值,因为它属于正确的职业公主,并将其分配给“兼容”列和“不兼容” " 取其他 3 个值的平均值。对于Shakira,它会从歌手的正确职业中选择值1,并将其分配给Compatible;对于不兼容,它平均其他值。其他名字也是如此
所以输出将是这样的:
DF1<-tribble(
~names, ~princess, ~singer, ~astronaut, ~painter,~Compatible,~Incompatible,
"diana", 4, 1, 2, 3, 4, 2,
"shakira", 2, 1, 3, 4, 1, 3,
"armstrong", 3, 4, 1, 2, 1, 3,
"picasso", 1, 3, 1, 4, 4, 1.66
)
这是显示正确姓名和职业的数据集:
DF3<- tribble(
~names, ~professions,
"diana", "princess",
"shakira", "singer",
"armstrong", "astronaut",
"picasso", "painter"
)
DF1[1:5] %>%
pivot_longer(-names) %>%
left_join(DF3, 'names') %>%
group_by(names, name = if_else(name == professions, 'compatible', 'incompatible')) %>%
summarise(profession = first(professions), value = mean(value), .groups = 'drop') %>%
pivot_wider()
# A tibble: 4 x 4
names profession compatible incompatible
<chr> <chr> <dbl> <dbl>
1 armstrong astronaut 1 3
2 diana princess 4 2
3 picasso painter 4 1.67
4 shakira singer 1 3