重塑 - 防止分类变量在列上重复
reshape - prevent categorical variable from repeating over columns
我想把 table 从长改成宽。它有多个值和一个分类变量。
table 看起来像:
df <- data.frame(name = c("john", "mary", "john", "mary"),
variable = c("math", "math", "science", "science"),
grade = c("sixth", "sixth", "sixth", "sixth"),
val1 = c(78, 88, 97, 100), val2 = c(92, 83, 69, 94))
我想要的是:
want <- data.frame(name = c("john", "mary"), grade = c("sixth", "sixth"),
math.val1 = c(78, 88), math.val2 = c(92, 83), science.val1 = c(97, 100),
science.val2 = c(69, 94))
没有成绩栏,我可以轻松实现:
reshape(df, idvar='name', timevar='variable', direction='wide')
在 "grade" 列中,我得到:
name grade.math val1.math val2.math grade.science val1.science val2.science
1 john sixth 78 92 sixth 97 69
2 mary sixth 88 83 sixth 100 94
我该如何纠正这个问题?
谢谢。
您需要将 'grade' 包含到 idvar 中。
reshape(df, idvar=c('name', 'grade'), timevar='variable', direction='wide')
使用 data.table
的 devel
版本,即 v1.9.5
,可以使用 dcast
重塑多个值列。它可以从 here
安装
library(data.table)
dcast(setDT(df), name+grade~variable, value.var=c('val1', 'val2'))
# name grade math_val1 science_val1 math_val2 science_val2
#1: john sixth 78 97 92 69
#2: mary sixth 88 100 83 94
我想把 table 从长改成宽。它有多个值和一个分类变量。
table 看起来像:
df <- data.frame(name = c("john", "mary", "john", "mary"),
variable = c("math", "math", "science", "science"),
grade = c("sixth", "sixth", "sixth", "sixth"),
val1 = c(78, 88, 97, 100), val2 = c(92, 83, 69, 94))
我想要的是:
want <- data.frame(name = c("john", "mary"), grade = c("sixth", "sixth"),
math.val1 = c(78, 88), math.val2 = c(92, 83), science.val1 = c(97, 100),
science.val2 = c(69, 94))
没有成绩栏,我可以轻松实现:
reshape(df, idvar='name', timevar='variable', direction='wide')
在 "grade" 列中,我得到:
name grade.math val1.math val2.math grade.science val1.science val2.science
1 john sixth 78 92 sixth 97 69
2 mary sixth 88 83 sixth 100 94
我该如何纠正这个问题?
谢谢。
您需要将 'grade' 包含到 idvar 中。
reshape(df, idvar=c('name', 'grade'), timevar='variable', direction='wide')
使用 data.table
的 devel
版本,即 v1.9.5
,可以使用 dcast
重塑多个值列。它可以从 here
library(data.table)
dcast(setDT(df), name+grade~variable, value.var=c('val1', 'val2'))
# name grade math_val1 science_val1 math_val2 science_val2
#1: john sixth 78 97 92 69
#2: mary sixth 88 100 83 94