如何将一列中的字符串值用于同一数据框中另一列中的 ID 和 select 行，每组不同？

Question

我有一个数据框，其中包含可以向参与者呈现的所有可能的顺序排列，以及一个包含每个参与者实际呈现的排列 ID（即呈现顺序）的列。

如何使用排列 ID select 每个参与者对应列的值？

为了使用虚拟数据进行说明，在本示例中，每个参与者看到的演示顺序的 ID 在列 presentation_type 中给出。我想将该 ID 用于 select 来自相应 order* 列的值。

library(dplyr)

df1 <- tibble::tribble(
  ~participant_id, ~presentation_type, ~trial_number, ~response, ~order1, ~order2, ~order3,
             "p1",           "order3",            1L,     "yes",     "a",     "b",     "c",
             "p1",           "order3",            2L,     "yes",     "b",     "c",     "a",
             "p1",           "order3",            3L,      "no",     "c",     "a",     "b",
             "p2",           "order1",            1L,      "no",     "a",     "b",     "c",
             "p2",           "order1",            2L,     "yes",     "b",     "c",     "a",
             "p2",           "order1",            3L,      "no",     "c",     "a",     "b",
             "p3",           "order2",            1L,      "no",     "a",     "b",     "c",
             "p3",           "order2",            2L,     "yes",     "b",     "c",     "a",
             "p3",           "order2",            3L,     "yes",     "c",     "a",     "b"
  )

换句话说，期望的结果是一个数据名，其中有一列包含每个参与者看到的实际刺激 (stimulus_presented)，如下所示：

desired_outcome <- tibble::tribble(
                     ~participant_id, ~presentation_type, ~trial_number, ~response, ~stimulus_presented,
                                "p1",           "order3",            1L,     "yes",              "c",
                                "p1",           "order3",            2L,     "yes",              "a",
                                "p1",           "order3",            3L,      "no",              "b",
                                "p2",           "order1",            1L,      "no",              "a",
                                "p2",           "order1",            2L,     "yes",              "b",
                                "p2",           "order1",            3L,      "no",              "c",
                                "p3",           "order2",            1L,      "no",              "b",
                                "p3",           "order2",            2L,     "yes",              "c",
                                "p3",           "order2",            3L,     "yes",              "a"
                     )

我以为我可以用类似下面的代码的东西到达那里，但它似乎只是将 participant_id“p1”的顺序分配给所有参与者。我是否需要以某种方式为个人参与者分组 by/map？

## Attempt so far - incorrect
# make single column of the stimulus presented on each row
stimuli_presented <- df1 %>% 
 select(stimulus_presented = .$presentation_type[1])

# bind our newly created column back onto the original data frame, then remove "order1", "order2", etc. 
df2 <- bind_cols(df1, stimuli_presented) %>% 
  relocate(stimulus_presented, .after = response) %>% 
  select(-c(starts_with("order")))

Answer 1

使用dplyr，您可以使用rowwise和get：

library(dplyr)
df1 %>% 
  rowwise() %>% 
  mutate(stimulus_presented = get(presentation_type)) %>% 
  select(-starts_with("order"))

# A tibble: 9 × 5
# Rowwise: 
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a

基数 R:

diag(as.matrix(df1[match(df1$presentation_type, colnames(df1))]))
#or
unlist(sapply(seq(nrow(df1)), \(x) df1[x, match(df1$presentation_type[x], colnames(df1))]))

#[1] "c" "a" "b" "a" "b" "c" "b" "c" "a"

Answer 2

您可以 pivot 将数据转换为“长”格式，然后 filter 为 Order 匹配 presentation_type。

library(tidyverse)

df1 %>% 
  pivot_longer(starts_with("order"), names_to = "Order", values_to = "stimulus_presented") %>% 
  filter(presentation_type == Order) %>% 
  select(-Order)

# A tibble: 9 x 5
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a

如何将一列中的字符串值用于同一数据框中另一列中的 ID 和 select 行，每组不同？

How to use string values in one column to ID and select rows from another column in the same dataframe, varying per group?

grouping

r

dataframe

dplyr

purrr