如何更有效地将我的数据框重塑为新形式 (R)?
How to more efficiently reshape my dataframe into a new form (R)?
我有这样的数据集 (df1
)
ID 2 4 6 8 10 12 14 16 18 20 22 24 Day
1 0 0 0 0 2 0 0 0 1 0 1 0 Sunday
1 0 0 0 0 0 4 0 0 0 0 0 0 Monday
1 0 0 0 0 0 0 0 0 2 0 0 0 Tuesday
1 0 0 0 0 0 0 2 0 0 0 0 0 Wednesday
1 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
1 0 0 0 0 0 0 0 0 2 0 0 0 Friday
1 0 0 0 0 0 0 0 0 0 2 0 0 Saturday
2 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
2 0 0 0 0 0 1 0 0 0 0 0 0 Monday
2 0 0 0 0 0 0 1 0 0 0 1 0 Tuesday
2 0 0 0 0 0 0 0 1 0 0 0 0 Wednesday
2 0 0 0 0 0 0 0 0 1 0 0 0 Thursday
2 0 0 0 0 0 2 0 0 0 1 0 0 Friday
2 0 0 0 0 0 0 0 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
3 0 0 0 0 0 0 2 0 0 0 0 0 Monday
3 0 0 0 0 0 1 0 0 2 0 0 0 Tuesday
3 0 0 0 0 0 0 0 0 0 0 0 0 Wednesday
3 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
3 0 0 0 0 0 0 0 0 0 0 0 0 Friday
3 0 0 0 0 0 0 2 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 2 0 0 0 0 Sunday
我有一个这样的 ID
清单:
ID
1
2
3
我想将 df1
转换成这种输出:
ID Var1 Var2 Var3 Var4 Var5 ...... Var82 Var83 Var84
1 0 0 0 0 2 2 0 0
2
3
其中 Var1
代表 'Sunday 2'(在第一个数据帧中),var84 代表 'Saturday24'。我想将结果导出为 .csv
文件。
我使用 for 循环(如下所示)来执行此操作,因为 ID 太多了。但是,问题是这些代码 运行 非常慢。有没有更快的方法来获得相同的结果?
library(dplyr)
library(reshape2)
for (i in ID_checklist$ID) {
x= filter(df1$ID %in% i)
x$Day = NULL
df.melted = melt(t(x[,-1]), id.vars = NULL)
myNewDF = data.frame(i, t(df.melted[,3]))
write.table(myNewDF,file="my12x7.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
}
我想这就是你想要的:
library(reshape2)
# this may be unnecessary depending on your data
# it will make sure the weekday columns come in the same order
# as the weekdays appear in your original data
df1$Day = factor(df1$Day, levels = unique(df1$Day))
# convert to a fully long format
df_long = melt(df1, id.var = c("ID", "Day"))
# convert to the wide format you want
result = dcast(data = df_long, ID ~ Day + variable, fun.aggregate = sum)
这会将日期名称附加到当前变量。如果您希望将它们设为 Var1 Var2 Var3
,请使用 paste()
并重命名列。
我们可以看前几列来验证:
result[, 1:6]
# ID Sunday_X2 Sunday_X4 Sunday_X6 Sunday_X8 Sunday_X10
# 1 1 0 0 0 0 2
# 2 2 0 0 0 0 0
# 3 3 0 0 0 0 0
我有这样的数据集 (df1
)
ID 2 4 6 8 10 12 14 16 18 20 22 24 Day
1 0 0 0 0 2 0 0 0 1 0 1 0 Sunday
1 0 0 0 0 0 4 0 0 0 0 0 0 Monday
1 0 0 0 0 0 0 0 0 2 0 0 0 Tuesday
1 0 0 0 0 0 0 2 0 0 0 0 0 Wednesday
1 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
1 0 0 0 0 0 0 0 0 2 0 0 0 Friday
1 0 0 0 0 0 0 0 0 0 2 0 0 Saturday
2 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
2 0 0 0 0 0 1 0 0 0 0 0 0 Monday
2 0 0 0 0 0 0 1 0 0 0 1 0 Tuesday
2 0 0 0 0 0 0 0 1 0 0 0 0 Wednesday
2 0 0 0 0 0 0 0 0 1 0 0 0 Thursday
2 0 0 0 0 0 2 0 0 0 1 0 0 Friday
2 0 0 0 0 0 0 0 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 0 0 0 0 0 Sunday
3 0 0 0 0 0 0 2 0 0 0 0 0 Monday
3 0 0 0 0 0 1 0 0 2 0 0 0 Tuesday
3 0 0 0 0 0 0 0 0 0 0 0 0 Wednesday
3 0 0 0 0 0 0 0 2 0 0 0 0 Thursday
3 0 0 0 0 0 0 0 0 0 0 0 0 Friday
3 0 0 0 0 0 0 2 0 0 0 0 0 Saturday
3 0 0 0 0 0 0 0 2 0 0 0 0 Sunday
我有一个这样的 ID
清单:
ID
1
2
3
我想将 df1
转换成这种输出:
ID Var1 Var2 Var3 Var4 Var5 ...... Var82 Var83 Var84
1 0 0 0 0 2 2 0 0
2
3
其中 Var1
代表 'Sunday 2'(在第一个数据帧中),var84 代表 'Saturday24'。我想将结果导出为 .csv
文件。
我使用 for 循环(如下所示)来执行此操作,因为 ID 太多了。但是,问题是这些代码 运行 非常慢。有没有更快的方法来获得相同的结果?
library(dplyr)
library(reshape2)
for (i in ID_checklist$ID) {
x= filter(df1$ID %in% i)
x$Day = NULL
df.melted = melt(t(x[,-1]), id.vars = NULL)
myNewDF = data.frame(i, t(df.melted[,3]))
write.table(myNewDF,file="my12x7.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
}
我想这就是你想要的:
library(reshape2)
# this may be unnecessary depending on your data
# it will make sure the weekday columns come in the same order
# as the weekdays appear in your original data
df1$Day = factor(df1$Day, levels = unique(df1$Day))
# convert to a fully long format
df_long = melt(df1, id.var = c("ID", "Day"))
# convert to the wide format you want
result = dcast(data = df_long, ID ~ Day + variable, fun.aggregate = sum)
这会将日期名称附加到当前变量。如果您希望将它们设为 Var1 Var2 Var3
,请使用 paste()
并重命名列。
我们可以看前几列来验证:
result[, 1:6]
# ID Sunday_X2 Sunday_X4 Sunday_X6 Sunday_X8 Sunday_X10
# 1 1 0 0 0 0 2
# 2 2 0 0 0 0 0
# 3 3 0 0 0 0 0