具有模式的多个变量的 R 从宽到长格式
R Wide to long format for multiple variables with patterns
我有一个数据集,其中包含一个标识符和重复 18 次的五列。我想将数据重组为长格式,将前五个列标题保留为列标题。下面是一个只有两次重复的示例:
structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"),
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"),
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged",
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L,
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L,
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral",
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))
我尝试使用 melt 和 patterns 函数,这似乎接近我想要的结果,但没有所需的列标题:
df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))
结果如下:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)
当我尝试使用下面的 value.name() 指定名称时,我收到错误消息:
df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))
我想要的结果是这样的:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"),
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L,
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged",
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))
在我看来,您使用 melt
开始了一段艰难的旅程:这个函数的名字很好,因为尝试使用它可能会融化您的大脑。开个玩笑,函数 melt
有很多基础计算,如果你有一个大数据集,它的使用效率可能很低。
我会用 rbindlist
手动解决问题(来自优秀的包 data.table
,如果你真的想使用它,它还附带 melt
的优化版本) , 手动连接多组列。这也保留了列名:
> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
Response.ID Task Freq Hours Value Mood
1: 1 task1 Daily 3 10 Engaged
2: 2 task1 Daily 2 8 Neutral
3: 1 task2 Weekly 4 10 Optimistic
4: 2 task2 Weekly 4 6 Neutral
这适用于您的示例:用重复次数替换索引 1:2
以使其适用于真实数据集(因此,lapply(1:18)
)。
我有一个数据集,其中包含一个标识符和重复 18 次的五列。我想将数据重组为长格式,将前五个列标题保留为列标题。下面是一个只有两次重复的示例:
structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"),
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"),
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged",
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L,
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L,
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral",
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))
我尝试使用 melt 和 patterns 函数,这似乎接近我想要的结果,但没有所需的列标题:
df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))
结果如下:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)
当我尝试使用下面的 value.name() 指定名称时,我收到错误消息:
df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))
我想要的结果是这样的:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"),
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L,
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged",
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))
在我看来,您使用 melt
开始了一段艰难的旅程:这个函数的名字很好,因为尝试使用它可能会融化您的大脑。开个玩笑,函数 melt
有很多基础计算,如果你有一个大数据集,它的使用效率可能很低。
我会用 rbindlist
手动解决问题(来自优秀的包 data.table
,如果你真的想使用它,它还附带 melt
的优化版本) , 手动连接多组列。这也保留了列名:
> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
Response.ID Task Freq Hours Value Mood
1: 1 task1 Daily 3 10 Engaged
2: 2 task1 Daily 2 8 Neutral
3: 1 task2 Weekly 4 10 Optimistic
4: 2 task2 Weekly 4 6 Neutral
这适用于您的示例:用重复次数替换索引 1:2
以使其适用于真实数据集(因此,lapply(1:18)
)。