dcast 中是否有允许我包含其他条件的函数?
Is there a function within dcast that allows me to include additional conditions?
我正在尝试创建一个仅包含一些长格式数据的宽格式数据集。这是来自在线学习模块的学习者的数据,他们有时会“卡在”屏幕中,因此记录了针对该屏幕的多次尝试。
lesson_long <- data.frame (id = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))
.
id screen question_attempt variable response
4256279 survey1 1 age 0
4256279 survey1 1 country 5
4256279 survey1 2 age 20
4256279 survey1 2 country 5
4256279 survey2 1 education 3
4256279 survey2 1 course 2
4256308 survey1 1 age 18
4256308 survey1 1 country 5
4256308 survey2 1 education 4
4256308 survey2 1 course 1
对于我的分析,我只需要包括他们在每个屏幕上的最后一次尝试中的响应(或者他们的最大响应 question_attempt - 有时他们在每个屏幕中有多达 8 或 9 次尝试)。之前的所有尝试都将被取消,我不需要在最终数据集中包含屏幕名称。最终的宽格式如下所示:
id age country education course
4256279 20 5 3 2
4256308 18 5 4 1
我一直在尝试仅使用 dcast(未成功):
lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)
fun.aggregate 显然没有像我编的那样工作...但是有解决办法吗?或者在使用 dcast 之前我可能需要一个额外的步骤来 select 数据?但是,如果这是解决方案,该怎么做呢?
很想知道您的答案。提前致谢!
您可以通过 id
、screen
和 question_attempt
和 select 每个 [=15] 的 last
值来 order
数据=].
library(data.table)
setDT(lesson_long)
dcast(lesson_long[order(id, screen, question_attempt)],
id~variable, value.var = 'response', fun.aggregate = last, fill = NA)
# id age country course education
#1: 4256279 20 5 2 3
#2: 4256308 18 5 1 4
同样,使用dplyr
和tidyr
:
library(dplyr)
lesson_long %>%
arrange(id, screen, question_attempt) %>%
tidyr::pivot_wider(names_from = variable, values_from = response,
id_cols = id, values_fn = last)
我正在尝试创建一个仅包含一些长格式数据的宽格式数据集。这是来自在线学习模块的学习者的数据,他们有时会“卡在”屏幕中,因此记录了针对该屏幕的多次尝试。
lesson_long <- data.frame (id = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))
.
id screen question_attempt variable response
4256279 survey1 1 age 0
4256279 survey1 1 country 5
4256279 survey1 2 age 20
4256279 survey1 2 country 5
4256279 survey2 1 education 3
4256279 survey2 1 course 2
4256308 survey1 1 age 18
4256308 survey1 1 country 5
4256308 survey2 1 education 4
4256308 survey2 1 course 1
对于我的分析,我只需要包括他们在每个屏幕上的最后一次尝试中的响应(或者他们的最大响应 question_attempt - 有时他们在每个屏幕中有多达 8 或 9 次尝试)。之前的所有尝试都将被取消,我不需要在最终数据集中包含屏幕名称。最终的宽格式如下所示:
id age country education course
4256279 20 5 3 2
4256308 18 5 4 1
我一直在尝试仅使用 dcast(未成功):
lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)
fun.aggregate 显然没有像我编的那样工作...但是有解决办法吗?或者在使用 dcast 之前我可能需要一个额外的步骤来 select 数据?但是,如果这是解决方案,该怎么做呢?
很想知道您的答案。提前致谢!
您可以通过 id
、screen
和 question_attempt
和 select 每个 [=15] 的 last
值来 order
数据=].
library(data.table)
setDT(lesson_long)
dcast(lesson_long[order(id, screen, question_attempt)],
id~variable, value.var = 'response', fun.aggregate = last, fill = NA)
# id age country course education
#1: 4256279 20 5 2 3
#2: 4256308 18 5 1 4
同样,使用dplyr
和tidyr
:
library(dplyr)
lesson_long %>%
arrange(id, screen, question_attempt) %>%
tidyr::pivot_wider(names_from = variable, values_from = response,
id_cols = id, values_fn = last)