Dplyr:根据特定条件创建两列

Dplyr: Create two columns based on specific conditions

在这个数据集 DF 中,我们有 4 个名字和 4 个职业。

DF<-tribble(
    ~names, ~princess, ~singer, ~astronaut, ~painter,
    "diana",   4,  1,  2, 3,
    "shakira",   2,  1, 3, 4,
    "armstrong",   3, 4, 1, 2,
    "picasso",  1, 3, 1, 4
)

假设单元格值是他们职业的某种衡量标准。因此,例如,戴安娜 (Diana) 的公主单元格值最高(正确),但夏奇拉 (Shakira) 的画家单元格值最高(错误)。

我想创建两个名为“兼容”和“不兼容”的列,程序将为戴安娜选择 4 的值,因为它属于正确的职业公主,并将其分配给“兼容”列和“不兼容” " 取其他 3 个值的平均值。对于Shakira,它会从歌手的正确职业中选择值1,并将其分配给Compatible;对于不兼容,它平均其他值。其他名字也是如此

所以输出将是这样的:

DF1<-tribble(
        ~names, ~princess, ~singer, ~astronaut, ~painter,~Compatible,~Incompatible,
        "diana",   4,  1,  2, 3, 4, 2,
        "shakira",   2,  1, 3, 4, 1, 3,
        "armstrong",   3, 4, 1, 2, 1, 3,
        "picasso",  1, 3, 1, 4, 4, 1.66
    )   

这是显示正确姓名和职业的数据集:

DF3<- tribble(
    ~names, ~professions,
    "diana",  "princess",
    "shakira",  "singer",
    "armstrong",  "astronaut",
    "picasso", "painter"
)
DF1[1:5] %>%
  pivot_longer(-names) %>%
  left_join(DF3, 'names') %>%
  group_by(names, name = if_else(name == professions, 'compatible', 'incompatible')) %>%
  summarise(profession = first(professions), value = mean(value), .groups = 'drop') %>%
  pivot_wider()

# A tibble: 4 x 4
  names     profession compatible incompatible
  <chr>     <chr>           <dbl>        <dbl>
1 armstrong astronaut           1         3   
2 diana     princess            4         2   
3 picasso   painter             4         1.67
4 shakira   singer              1         3