R中的序列变化编码
Sequence change coding in R
我问了一些非常相似的问题 但我现在对我的问题有了更好的理解。我会尽量问清楚的。
我有一个示例数据集如下所示:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <- c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
id item.id sequence score
1 1 1 1 0
2 1 1 2 0
3 1 2 1 0
4 2 1 1 0
5 2 1 2 0
6 2 1 3 1
7 3 1 1 2
8 3 1 2 0
9 4 1 1 1
10 4 2 1 1
11 5 1 1 1
12 5 2 1 0
13 5 2 2 1
14 5 2 3 1
15 6 1 1 0
16 6 1 2 0
17 6 1 3 0
18 7 1 1 1
19 8 1 1 0
20 8 2 1 2
21 9 1 1 1
22 9 1 2 2
23 10 1 1 2
24 10 1 2 1
id
代表每个学生,item.id
代表学生做题,sequence
是每个item.id
的尝试次数,score
是每次尝试的分数,取 0,1 或 2。学生可以更改他们的答案。
对于每个 id
中的 item.id
,我想通过查看最后两个序列(更改)来创建一个变量 (status
):
a) assign "WW" for those who changed from wrong to wrong (0 to 0),
b) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
c) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
d) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).
分数从 0 到 1 或 0 到 2 或 1 到 2 的变化被认为是正确的(正确的)变化,同时,
分数从 1 到 0 或 2 到 0 或 2 到 1 的变化被认为是不正确的(错误的)变化。
如果 item.id
只有一次尝试 id
=7
,那么 status
应该是 "one.right"
。如果 score
是 0
,那么它应该是 "one.wrong"
。同时,score
在1
或2
时被认为是right
,0
时score
被认为是错误的。
]
所需的输出将包含案例:
> desired
id item.id status
1 1 1 WW
2 1 2 one.wrong
3 2 1 WR
4 3 1 RW
5 4 1 one.right
6 4 2 one.right
7 5 1 one.right
8 5 2 RR
9 6 1 WW
10 7 1 one.right
11 8 1 one.wrong
12 8 2 one.right
13 9 1 WR
14 10 1 RW
与之前版本问题的主要区别是我没有考虑更改
a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.
再按逻辑应该是分数增加就WR,分数减少就RW
我收到的最佳答案是
library(dplyr)
library(purrr)
library(forcats)
data %>%
mutate(status = ifelse(score > 0, "R", "W")) %>%
group_by(id, item.id) %>%
filter(sequence == n() - 1 | sequence == n()) %>%
summarise(status = paste(status, collapse = "")) %>%
ungroup() %>%
mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))
但我需要处理 decreasing/increasing 得分模式。
有什么意见吗?
谢谢!
这是每一行的分类:
library(dplyr)
data = data %>%
group_by(id, item.id) %>%
mutate(diff = c(0, diff(score)),
status = case_when(
n() == 1 & score == 0 ~ "one.wrong",
n() == 1 & score > 0 ~ "one.right",
diff == 0 & score == 0 ~ "WW",
diff == 0 & score > 0 ~ "RR",
diff > 0 ~ "WR",
diff < 0 ~ "RW",
TRUE ~ "oops"
))
print.data.frame(data)
# id item.id sequence score diff status
# 1 1 1 1 0 0 WW
# 2 1 1 2 0 0 WW
# 3 1 2 1 0 0 one.wrong
# 4 2 1 1 0 0 WW
# 5 2 1 2 0 0 WW
# 6 2 1 3 1 1 WR
# 7 3 1 1 2 0 RR
# 8 3 1 2 0 -2 RW
# 9 4 1 1 1 0 one.right
# 10 4 2 1 1 0 one.right
# 11 5 1 1 1 0 one.right
# 12 5 2 1 0 0 WW
# 13 5 2 2 1 1 WR
# 14 5 2 3 1 0 RR
# 15 6 1 1 0 0 WW
# 16 6 1 2 0 0 WW
# 17 6 1 3 0 0 WW
# 18 7 1 1 1 0 one.right
# 19 8 1 1 0 0 one.wrong
# 20 8 2 1 2 0 one.right
# 21 9 1 1 1 0 RR
# 22 9 1 2 2 1 WR
# 23 10 1 1 2 0 RR
# 24 10 1 2 1 -1 RW
然后我们可以对其进行总结,取最后的 status
值:
summarize(data, status = last(status))
# # A tibble: 14 x 3
# # Groups: id [10]
# id item.id status
# <dbl> <dbl> <chr>
# 1 1 1 WW
# 2 1 2 one.wrong
# 3 2 1 WR
# 4 3 1 RW
# 5 4 1 one.right
# 6 4 2 one.right
# 7 5 1 one.right
# 8 5 2 RR
# 9 6 1 WW
# 10 7 1 one.right
# 11 8 1 one.wrong
# 12 8 2 one.right
# 13 9 1 WR
# 14 10 1 RW
这似乎符合您想要的输出。
我问了一些非常相似的问题
我有一个示例数据集如下所示:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <- c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
id item.id sequence score
1 1 1 1 0
2 1 1 2 0
3 1 2 1 0
4 2 1 1 0
5 2 1 2 0
6 2 1 3 1
7 3 1 1 2
8 3 1 2 0
9 4 1 1 1
10 4 2 1 1
11 5 1 1 1
12 5 2 1 0
13 5 2 2 1
14 5 2 3 1
15 6 1 1 0
16 6 1 2 0
17 6 1 3 0
18 7 1 1 1
19 8 1 1 0
20 8 2 1 2
21 9 1 1 1
22 9 1 2 2
23 10 1 1 2
24 10 1 2 1
id
代表每个学生,item.id
代表学生做题,sequence
是每个item.id
的尝试次数,score
是每次尝试的分数,取 0,1 或 2。学生可以更改他们的答案。
对于每个 id
中的 item.id
,我想通过查看最后两个序列(更改)来创建一个变量 (status
):
a) assign "WW" for those who changed from wrong to wrong (0 to 0),
b) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
c) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
d) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).
分数从 0 到 1 或 0 到 2 或 1 到 2 的变化被认为是正确的(正确的)变化,同时, 分数从 1 到 0 或 2 到 0 或 2 到 1 的变化被认为是不正确的(错误的)变化。
如果 item.id
只有一次尝试 id
=7
,那么 status
应该是 "one.right"
。如果 score
是 0
,那么它应该是 "one.wrong"
。同时,score
在1
或2
时被认为是right
,0
时score
被认为是错误的。
所需的输出将包含案例:
> desired
id item.id status
1 1 1 WW
2 1 2 one.wrong
3 2 1 WR
4 3 1 RW
5 4 1 one.right
6 4 2 one.right
7 5 1 one.right
8 5 2 RR
9 6 1 WW
10 7 1 one.right
11 8 1 one.wrong
12 8 2 one.right
13 9 1 WR
14 10 1 RW
与之前版本问题的主要区别是我没有考虑更改
a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.
再按逻辑应该是分数增加就WR,分数减少就RW
我收到的最佳答案是
library(dplyr)
library(purrr)
library(forcats)
data %>%
mutate(status = ifelse(score > 0, "R", "W")) %>%
group_by(id, item.id) %>%
filter(sequence == n() - 1 | sequence == n()) %>%
summarise(status = paste(status, collapse = "")) %>%
ungroup() %>%
mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))
但我需要处理 decreasing/increasing 得分模式。
有什么意见吗? 谢谢!
这是每一行的分类:
library(dplyr)
data = data %>%
group_by(id, item.id) %>%
mutate(diff = c(0, diff(score)),
status = case_when(
n() == 1 & score == 0 ~ "one.wrong",
n() == 1 & score > 0 ~ "one.right",
diff == 0 & score == 0 ~ "WW",
diff == 0 & score > 0 ~ "RR",
diff > 0 ~ "WR",
diff < 0 ~ "RW",
TRUE ~ "oops"
))
print.data.frame(data)
# id item.id sequence score diff status
# 1 1 1 1 0 0 WW
# 2 1 1 2 0 0 WW
# 3 1 2 1 0 0 one.wrong
# 4 2 1 1 0 0 WW
# 5 2 1 2 0 0 WW
# 6 2 1 3 1 1 WR
# 7 3 1 1 2 0 RR
# 8 3 1 2 0 -2 RW
# 9 4 1 1 1 0 one.right
# 10 4 2 1 1 0 one.right
# 11 5 1 1 1 0 one.right
# 12 5 2 1 0 0 WW
# 13 5 2 2 1 1 WR
# 14 5 2 3 1 0 RR
# 15 6 1 1 0 0 WW
# 16 6 1 2 0 0 WW
# 17 6 1 3 0 0 WW
# 18 7 1 1 1 0 one.right
# 19 8 1 1 0 0 one.wrong
# 20 8 2 1 2 0 one.right
# 21 9 1 1 1 0 RR
# 22 9 1 2 2 1 WR
# 23 10 1 1 2 0 RR
# 24 10 1 2 1 -1 RW
然后我们可以对其进行总结,取最后的 status
值:
summarize(data, status = last(status))
# # A tibble: 14 x 3
# # Groups: id [10]
# id item.id status
# <dbl> <dbl> <chr>
# 1 1 1 WW
# 2 1 2 one.wrong
# 3 2 1 WR
# 4 3 1 RW
# 5 4 1 one.right
# 6 4 2 one.right
# 7 5 1 one.right
# 8 5 2 RR
# 9 6 1 WW
# 10 7 1 one.right
# 11 8 1 one.wrong
# 12 8 2 one.right
# 13 9 1 WR
# 14 10 1 RW
这似乎符合您想要的输出。