通过反向熔化重塑数据框
Reshaping a data frame by reverse melt
我有一个数据集,其中包含 830 和 930 的两组观察值。我的 objective 是重塑我的数据框,以便有一个 originID
列,一个 830
列和一个 930
列,将值保留在各自小时列中的 walking
下。它基本上是一个反向重塑。有没有一种在 R 中执行此操作的快速方法,什么包最合适?
> df
originId walking hour
1 359727104 3.440248 830
2 359931904 8.065233 830
3 229873828 3.519326 830
4 359931908 20.758961 830
5 359931909 15.050358 830
6 359727113 3.178191 830
1063 359727104 3.029167 930
1064 359931904 8.093116 930
1065 229873828 3.523732 930
1066 359931908 21.234964 930
1067 359931909 15.701993 930
1068 359727113 2.768297 930
我试过 reshape2
中的这个公式,但它没有产生正确的结果。
> dcast(df, formula = originId + walking ~ hour)
Using hour as value column: use value.var to override.
originId walking 830 930
1 229873828 3.519326 830 NA
2 229873828 3.523732 NA 930
3 359727104 3.029167 NA 930
4 359727104 3.440248 830 NA
5 359727113 2.768297 NA 930
6 359727113 3.178191 830 NA
7 359931904 8.065233 830 NA
8 359931904 8.093116 NA 930
9 359931908 20.758961 830 NA
10 359931908 21.234964 NA 930
11 359931909 15.050358 830 NA
12 359931909 15.701993 NA 930
这是数据示例:
> dput(df)
structure(list(originId = c(359727104, 359931904, 229873828,
359931908, 359931909, 359727113, 359727104, 359931904, 229873828,
359931908, 359931909, 359727113), walking = c(3.44024822695035,
8.06523297491039, 3.51932624113475, 20.7589605734767, 15.0503584229391,
3.1781914893617, 3.02916666666667, 8.09311594202899, 3.52373188405797,
21.2349637681159, 15.7019927536232, 2.76829710144928), hour = c(830L,
830L, 830L, 830L, 830L, 830L, 930L, 930L, 930L, 930L, 930L, 930L
)), .Names = c("originId", "walking", "hour"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 1063L, 1064L, 1065L, 1066L, 1067L, 1068L), class = "data.frame")
尝试 tidyr
:
df %>% spread(hour, walking)
我建议使用 dplyr
像这样更改数字,这样您就不必处理以数字开头的列名:
df %>%
mutate(hour = paste0('hour_', hour)) %>%
spread(hour, walking)
您需要使用 walking
变量作为您的 value.var
:
dcast(df, originId ~ hour, value.var = 'walking')
给出:
originId 830 930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
甚至可能更好:
dcast(df, originId ~ paste0('hr_',hour), value.var = 'walking')
给出:
originId hr_830 hr_930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
我有一个数据集,其中包含 830 和 930 的两组观察值。我的 objective 是重塑我的数据框,以便有一个 originID
列,一个 830
列和一个 930
列,将值保留在各自小时列中的 walking
下。它基本上是一个反向重塑。有没有一种在 R 中执行此操作的快速方法,什么包最合适?
> df
originId walking hour
1 359727104 3.440248 830
2 359931904 8.065233 830
3 229873828 3.519326 830
4 359931908 20.758961 830
5 359931909 15.050358 830
6 359727113 3.178191 830
1063 359727104 3.029167 930
1064 359931904 8.093116 930
1065 229873828 3.523732 930
1066 359931908 21.234964 930
1067 359931909 15.701993 930
1068 359727113 2.768297 930
我试过 reshape2
中的这个公式,但它没有产生正确的结果。
> dcast(df, formula = originId + walking ~ hour)
Using hour as value column: use value.var to override.
originId walking 830 930
1 229873828 3.519326 830 NA
2 229873828 3.523732 NA 930
3 359727104 3.029167 NA 930
4 359727104 3.440248 830 NA
5 359727113 2.768297 NA 930
6 359727113 3.178191 830 NA
7 359931904 8.065233 830 NA
8 359931904 8.093116 NA 930
9 359931908 20.758961 830 NA
10 359931908 21.234964 NA 930
11 359931909 15.050358 830 NA
12 359931909 15.701993 NA 930
这是数据示例:
> dput(df)
structure(list(originId = c(359727104, 359931904, 229873828,
359931908, 359931909, 359727113, 359727104, 359931904, 229873828,
359931908, 359931909, 359727113), walking = c(3.44024822695035,
8.06523297491039, 3.51932624113475, 20.7589605734767, 15.0503584229391,
3.1781914893617, 3.02916666666667, 8.09311594202899, 3.52373188405797,
21.2349637681159, 15.7019927536232, 2.76829710144928), hour = c(830L,
830L, 830L, 830L, 830L, 830L, 930L, 930L, 930L, 930L, 930L, 930L
)), .Names = c("originId", "walking", "hour"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 1063L, 1064L, 1065L, 1066L, 1067L, 1068L), class = "data.frame")
尝试 tidyr
:
df %>% spread(hour, walking)
我建议使用 dplyr
像这样更改数字,这样您就不必处理以数字开头的列名:
df %>%
mutate(hour = paste0('hour_', hour)) %>%
spread(hour, walking)
您需要使用 walking
变量作为您的 value.var
:
dcast(df, originId ~ hour, value.var = 'walking')
给出:
originId 830 930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
甚至可能更好:
dcast(df, originId ~ paste0('hr_',hour), value.var = 'walking')
给出:
originId hr_830 hr_930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993