如何将一列中的字符串值用于同一数据框中另一列中的 ID 和 select 行,每组不同?

How to use string values in one column to ID and select rows from another column in the same dataframe, varying per group?

我有一个数据框,其中包含可以向参与者呈现的所有可能的顺序排列,以及一个包含每个参与者实际呈现的排列 ID(即呈现顺序)的列。

如何使用排列 ID select 每个参与者对应列的值?

为了使用虚拟数据进行说明,在本示例中,每个参与者看到的演示顺序的 ID 在列 presentation_type 中给出。我想将该 ID 用于 select 来自相应 order* 列的值。

library(dplyr)

df1 <- tibble::tribble(
  ~participant_id, ~presentation_type, ~trial_number, ~response, ~order1, ~order2, ~order3,
             "p1",           "order3",            1L,     "yes",     "a",     "b",     "c",
             "p1",           "order3",            2L,     "yes",     "b",     "c",     "a",
             "p1",           "order3",            3L,      "no",     "c",     "a",     "b",
             "p2",           "order1",            1L,      "no",     "a",     "b",     "c",
             "p2",           "order1",            2L,     "yes",     "b",     "c",     "a",
             "p2",           "order1",            3L,      "no",     "c",     "a",     "b",
             "p3",           "order2",            1L,      "no",     "a",     "b",     "c",
             "p3",           "order2",            2L,     "yes",     "b",     "c",     "a",
             "p3",           "order2",            3L,     "yes",     "c",     "a",     "b"
  )

换句话说,期望的结果是一个数据名,其中有一列包含每个参与者看到的实际刺激 (stimulus_presented),如下所示:

desired_outcome <- tibble::tribble(
                     ~participant_id, ~presentation_type, ~trial_number, ~response, ~stimulus_presented,
                                "p1",           "order3",            1L,     "yes",              "c",
                                "p1",           "order3",            2L,     "yes",              "a",
                                "p1",           "order3",            3L,      "no",              "b",
                                "p2",           "order1",            1L,      "no",              "a",
                                "p2",           "order1",            2L,     "yes",              "b",
                                "p2",           "order1",            3L,      "no",              "c",
                                "p3",           "order2",            1L,      "no",              "b",
                                "p3",           "order2",            2L,     "yes",              "c",
                                "p3",           "order2",            3L,     "yes",              "a"
                     )

我以为我可以用类似下面的代码的东西到达那里,但它似乎只是将 participant_id“p1”的顺序分配给所有参与者。我是否需要以某种方式为个人参与者分组 by/map?

## Attempt so far - incorrect
# make single column of the stimulus presented on each row
stimuli_presented <- df1 %>% 
 select(stimulus_presented = .$presentation_type[1])

# bind our newly created column back onto the original data frame, then remove "order1", "order2", etc. 
df2 <- bind_cols(df1, stimuli_presented) %>% 
  relocate(stimulus_presented, .after = response) %>% 
  select(-c(starts_with("order")))

使用dplyr,您可以使用rowwiseget

library(dplyr)
df1 %>% 
  rowwise() %>% 
  mutate(stimulus_presented = get(presentation_type)) %>% 
  select(-starts_with("order"))

# A tibble: 9 × 5
# Rowwise: 
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a                 

基数 R:

diag(as.matrix(df1[match(df1$presentation_type, colnames(df1))]))
#or
unlist(sapply(seq(nrow(df1)), \(x) df1[x, match(df1$presentation_type[x], colnames(df1))]))

#[1] "c" "a" "b" "a" "b" "c" "b" "c" "a"

您可以 pivot 将数据转换为“长”格式,然后 filterOrder 匹配 presentation_type

library(tidyverse)

df1 %>% 
  pivot_longer(starts_with("order"), names_to = "Order", values_to = "stimulus_presented") %>% 
  filter(presentation_type == Order) %>% 
  select(-Order)

# A tibble: 9 x 5
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a