根据其他列的顺序，R 中的不同行

Question

我进行了多期在线实验，但存在部分重复和不完整的数据。

简单地说，在线实验中有2个试验（试验1:2），每个试验包含2个周期（周期1:2）。参与者做出决定（1:5）猜测一个不变的性质（1:5）在 2 个试验阶段。试用后，性质发生变化运行domly.

我发现参与者可能会卡在一段时间内，不得不重做实验，这会导致我的数据出现重复和不完整的试验。

例如：

id	decision	nature	period	trial
1000	1	5	1	1
1000	1	5	2	1
1000	1	5	1	2
1000	1	5	2	2
1000	1	5	1	3
1000	2	2	1	1
1000	3	2	2	1
1000	1	2	1	2
1000	3	2	2	2
1000	5	2	1	3
1000	1	2	2	3

如您所见，在第一次尝试中，试验 3 未完成，因为该参与者被卡住了，不得不重做实验，从而导致重复数据。

我运行一个不同的函数r代码：distinct(id, trial,period,.keep_all = TRUE), 但我得到了这个

id	decision	nature	period	trial
1000	1	5	1	1
1000	1	5	2	1
1000	1	5	1	2
1000	1	5	2	2
1000	1	5	1	3
1000	5	2	1	3

试验 3 中的不同性质值表明 Distinct 函数混合了该参与者的两次不同尝试。如何使用 R 中的 distinct 或其他函数来获取同一尝试中参与者的完整数据？

我想要的输出是为每个参与者保留一组完整的试验 (1:3)，其中自然值在试验中是一致的，并消除所有重复和不完整的试验。

提前致谢！

Answer 1

这是您要找的吗？

## data
data <- structure(list(id = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L), decision = c(1L, 1L, 1L,
1L, 1L, 2L, 3L, 1L, 3L, 5L, 1L), nature = c(5L, 5L, 5L, 5L, 5L,
2L, 2L, 2L, 2L, 2L, 2L), period = c(1L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 2L), trial = c(1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 2L,
3L, 3L)), row.names = c(NA, -11L), class = "data.frame")



library(dplyr)
data %>% 
    mutate(rownum = 1:n()) %>% 
    group_by(id, trial, period) %>%
    mutate(maxrownum = max(rownum)) %>% 
    filter(rownum == maxrownum) %>% 
    select(-c(rownum, maxrownum))

我已经为行号创建了一个标识符。假设您的数据是按尝试排序的，选择行号等于 max(row number) 的行会选择每个 (id, trial, period) 三元组的最后一次尝试。

输出：

# Groups:   id, trial, period [6]
     id decision nature period trial
  <int>    <int>  <int>  <int> <int>
1  1000        2      2      1     1
2  1000        3      2      2     1
3  1000        1      2      1     2
4  1000        3      2      2     2
5  1000        5      2      1     3
6  1000        1      2      2     3

根据其他列的顺序，R 中的不同行

distinct rows in R based on the order of other columns

r

distinct