dcast 中是否有允许我包含其他条件的函数?

Is there a function within dcast that allows me to include additional conditions?

我正在尝试创建一个仅包含一些长格式数据的宽格式数据集。这是来自在线学习模块的学习者的数据,他们有时会“卡在”屏幕中,因此记录了针对该屏幕的多次尝试。

lesson_long <- data.frame (id  = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
                           screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
                           question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
                           variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
                           response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))

.

id       screen     question_attempt variable response
4256279  survey1            1           age       0
4256279  survey1            1         country     5
4256279  survey1            2           age       20
4256279  survey1            2         country     5
4256279  survey2            1        education    3
4256279  survey2            1         course      2
4256308  survey1            1           age       18
4256308  survey1            1         country     5
4256308  survey2            1        education    4
4256308  survey2            1         course      1

对于我的分析,我只需要包括他们在每个屏幕上的最后一次尝试中的响应(或者他们的最大响应 question_attempt - 有时他们在每个屏幕中有多达 8 或 9 次尝试)。之前的所有尝试都将被取消,我不需要在最终数据集中包含屏幕名称。最终的宽格式如下所示:

id        age  country education course
4256279   20     5         3         2
4256308   18     5         4         1

我一直在尝试仅使用 dcast(未成功):

lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)

fun.aggregate 显然没有像我编的那样工作...但是有解决办法吗?或者在使用 dcast 之前我可能需要一个额外的步骤来 select 数据?但是,如果这是解决方案,该怎么做呢?

很想知道您的答案。提前致谢!

您可以通过 idscreenquestion_attempt 和 select 每个 [=15] 的 last 值来 order 数据=].

library(data.table)

setDT(lesson_long)

dcast(lesson_long[order(id, screen, question_attempt)], 
      id~variable, value.var = 'response', fun.aggregate = last, fill = NA)

#        id age country course education
#1: 4256279  20       5      2         3
#2: 4256308  18       5      1         4

同样,使用dplyrtidyr

library(dplyr)

lesson_long %>%
  arrange(id, screen, question_attempt) %>%
  tidyr::pivot_wider(names_from = variable, values_from = response, 
                     id_cols = id, values_fn = last)