具有模式的多个变量的 R 从宽到长格式

R Wide to long format for multiple variables with patterns

我有一个数据集,其中包含一个标识符和重复 18 次的五列。我想将数据重组为长格式,将前五个列标题保留为列标题。下面是一个只有两次重复的示例:

structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"), 
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"), 
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged", 
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L, 
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L, 
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral", 
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))

我尝试使用 melt 和 patterns 函数,这似乎接近我想要的结果,但没有所需的列标题:

df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))

结果如下:

structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)

当我尝试使用下面的 value.name() 指定名称时,我收到错误消息:

df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))

我想要的结果是这样的:

structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"), 
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L, 
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged", 
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))

在我看来,您使用 melt 开始了一段艰难的旅程:这个函数的名字很好,因为尝试使用它可能会融化您的大脑。开个玩笑,函数 melt 有很多基础计算,如果你有一个大数据集,它的使用效率可能很低。

我会用 rbindlist 手动解决问题(来自优秀的包 data.table,如果你真的想使用它,它还附带 melt 的优化版本) , 手动连接多组列。这也保留了列名:

> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
   Response.ID  Task   Freq Hours Value       Mood
1:           1 task1  Daily     3    10    Engaged
2:           2 task1  Daily     2     8    Neutral
3:           1 task2 Weekly     4    10 Optimistic
4:           2 task2 Weekly     4     6    Neutral

这适用于您的示例:用重复次数替换索引 1:2 以使其适用于真实数据集(因此,lapply(1:18))。