R中的序列变化编码

Question

我问了一些非常相似的问题但我现在对我的问题有了更好的理解。我会尽量问清楚的。

我有一个示例数据集如下所示：

    id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <-    c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)

data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
   id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       2        1     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     2
8   3       1        2     0
9   4       1        1     1
10  4       2        1     1
11  5       1        1     1
12  5       2        1     0
13  5       2        2     1
14  5       2        3     1
15  6       1        1     0
16  6       1        2     0
17  6       1        3     0
18  7       1        1     1
19  8       1        1     0
20  8       2        1     2
21  9       1        1     1
22  9       1        2     2
23 10       1        1     2
24 10       1        2     1

id代表每个学生，item.id代表学生做题，sequence是每个item.id的尝试次数，score是每次尝试的分数，取 0,1 或 2。学生可以更改他们的答案。

对于每个 id 中的 item.id，我想通过查看最后两个序列（更改）来创建一个变量 (status)：

a) assign "WW" for those who changed from wrong to wrong (0 to 0),
b) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
c) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
d) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).

分数从 0 到 1 或 0 到 2 或 1 到 2 的变化被认为是正确的（正确的）变化，同时，分数从 1 到 0 或 2 到 0 或 2 到 1 的变化被认为是不正确的（错误的）变化。

如果 item.id 只有一次尝试 id=7，那么 status 应该是 "one.right"。如果 score 是 0，那么它应该是 "one.wrong"。同时，score在1或2时被认为是right，0时score被认为是错误的。

]

所需的输出将包含案例：

 > desired
     id item.id    status
  1   1       1        WW
  2   1       2 one.wrong
  3   2       1        WR
  4   3       1        RW
  5   4       1 one.right
  6   4       2 one.right
  7   5       1 one.right
  8   5       2        RR
  9   6       1        WW
  10  7       1 one.right
  11  8       1 one.wrong
  12  8       2 one.right
  13  9       1        WR
  14  10      1        RW

与之前版本问题的主要区别是我没有考虑更改

a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.

再按逻辑应该是分数增加就WR，分数减少就RW

我收到的最佳答案是

library(dplyr)
library(purrr)
library(forcats)

data %>% 
  mutate(status = ifelse(score > 0, "R", "W")) %>% 
  group_by(id, item.id) %>% 
  filter(sequence == n() - 1 | sequence == n()) %>%  
  summarise(status = paste(status, collapse = "")) %>% 
  ungroup() %>% 
  mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))

但我需要处理 decreasing/increasing 得分模式。

有什么意见吗？谢谢！

Answer 1

这是每一行的分类：

library(dplyr)
data = data %>%
  group_by(id, item.id) %>%
  mutate(diff = c(0, diff(score)),
         status = case_when(
           n() == 1 & score == 0 ~ "one.wrong",
           n() == 1 & score > 0 ~ "one.right",
           diff == 0 & score == 0 ~ "WW",
           diff == 0 & score > 0 ~ "RR",
           diff > 0 ~ "WR",
           diff < 0 ~ "RW",
           TRUE ~ "oops"
         ))
print.data.frame(data)
#    id item.id sequence score diff    status
# 1   1       1        1     0    0        WW
# 2   1       1        2     0    0        WW
# 3   1       2        1     0    0 one.wrong
# 4   2       1        1     0    0        WW
# 5   2       1        2     0    0        WW
# 6   2       1        3     1    1        WR
# 7   3       1        1     2    0        RR
# 8   3       1        2     0   -2        RW
# 9   4       1        1     1    0 one.right
# 10  4       2        1     1    0 one.right
# 11  5       1        1     1    0 one.right
# 12  5       2        1     0    0        WW
# 13  5       2        2     1    1        WR
# 14  5       2        3     1    0        RR
# 15  6       1        1     0    0        WW
# 16  6       1        2     0    0        WW
# 17  6       1        3     0    0        WW
# 18  7       1        1     1    0 one.right
# 19  8       1        1     0    0 one.wrong
# 20  8       2        1     2    0 one.right
# 21  9       1        1     1    0        RR
# 22  9       1        2     2    1        WR
# 23 10       1        1     2    0        RR
# 24 10       1        2     1   -1        RW

然后我们可以对其进行总结，取最后的 status 值：

summarize(data, status = last(status))
# # A tibble: 14 x 3
# # Groups:   id [10]
#       id item.id status   
#    <dbl>   <dbl> <chr>    
#  1     1       1 WW       
#  2     1       2 one.wrong
#  3     2       1 WR       
#  4     3       1 RW       
#  5     4       1 one.right
#  6     4       2 one.right
#  7     5       1 one.right
#  8     5       2 RR       
#  9     6       1 WW       
# 10     7       1 one.right
# 11     8       1 one.wrong
# 12     8       2 one.right       
# 13     9       1 WR       
# 14    10       1 RW

这似乎符合您想要的输出。

R中的序列变化编码

Sequence change coding in R

r

sequence