将非平衡 2d 数据帧转换为 3d 数组
Converting a nonbalanced 2d dataframe to 3d array
我有一个包含列 id、日期和其他 5 个变量的数据框。我想将数据帧转换为大小为 (#ids,#dates,5) 的 3d 矩阵。我知道我可以使用 dim 函数等,如果所有 id 在数据框中具有相同的行数。然而,事实并非如此。如何将非平衡(不确定这是否是正确的术语)数据帧转换为 3d 矩阵,每个 2d 矩阵对应一个 id 并具有维度(#dates,5)。重要的是每个 2d 矩阵的行数随 id 而变化。
我很不擅长处理矩阵。对此表示歉意。
id date x1 x2 x3 x4 x5
1: 1 2009-01-01 5 4 2 5.5 7
2: 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
3: 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
4: 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
5: 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
6: 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
7: 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
对于每个 id,我想创建一个 2d 矩阵和一个 3d 数组。我需要这种格式来将数据传递给 keras R 库。谢谢你。
此致,
这里有一个 tidyverse 选项:
library(tidyverse)
df <- data.frame(id = c(1L, 1L, 1L, 2L, 3L, 4L, 4L),
date = as.Date(c("2009-01-01", "2009-01-02", "2009-01-03", "2009-01-01", "2010-01-01", "2009-01-01", "2009-01-02")),
x1 = c(5, 5.4, 4.4, 12.4, 3.4, 2.4, 3.4),
x2 = c(4, 4.1, 2.1, 2.7, 1.7, 3.7, 5.7),
x3 = c(2, 2.2, 4.2, 4.9, 4.6, 5.6, 7.6),
x4 = c(5.5, 5.3, 6.3, 3.3, 4.3, 2.3, 3.3),
x5 = c(7, 7.1, 10.1, 2.1, 6.1, 9.1, 5.1))
a <- df %>%
complete(id, date, fill = map(df[3:7], ~0)) %>% # insert missing rows; fill with 0s
nest(-id) %>% # collapse other columns to list column of data frames
mutate(data = map(data, ~as.matrix(.x[-1]))) %>% # drop dates from nested data frames and coerce each to matrix
pull(data) %>% # extract matrix list
invoke(abind::abind, ., along = 3) %>% # abind in 3rd dimension
`dimnames<-`(list(as.character(unique(df$date)), names(df[3:7]), unique(df$id))) # set dimnames properly
a
#> , , 1
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 5.0 4.0 2.0 5.5 7.0
#> 2009-01-02 5.4 4.1 2.2 5.3 7.1
#> 2009-01-03 4.4 2.1 4.2 6.3 10.1
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 2
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 12.4 2.7 4.9 3.3 2.1
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 3
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 0.0 0.0 0.0 0.0 0.0
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 3.4 1.7 4.6 4.3 6.1
#>
#> , , 4
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 2.4 3.7 5.6 2.3 9.1
#> 2009-01-02 3.4 5.7 7.6 3.3 5.1
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
不确定我是否理解您的预期输出,但我建议将您的 data.frame
拆分为 data.frame
的 list
,或者 nest
将您的数据用于每 id
.
选项 1:split
ting
split(df, df$id)
#$`1`
# id date x1 x2 x3 x4 x5
#1 1 2009-01-01 5.0 4.0 2.0 5.5 7.0
#2 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
#3 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
#
#$`2`
# id date x1 x2 x3 x4 x5
#4 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
#
#$`3`
# id date x1 x2 x3 x4 x5
#5 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
#
#$`4`
# id date x1 x2 x3 x4 x5
#6 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
#7 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
选项 2:nest
ing
library(tidyverse)
df %>%
group_by(id) %>%
nest()
## A tibble: 4 x 2
# id data
# <int> <list>
#1 1 <tibble [3 × 6]>
#2 2 <tibble [1 × 6]>
#3 3 <tibble [1 × 6]>
#4 4 <tibble [2 × 6]>
示例数据
df <- read.table(text =
" id date x1 x2 x3 x4 x5
1 2009-01-01 5 4 2 5.5 7
1 2009-01-02 5.4 4.1 2.2 5.3 7.1
1 2009-01-03 4.4 2.1 4.2 6.3 10.1
2 2009-01-01 12.4 2.7 4.9 3.3 2.1
3 2010-01-01 3.4 1.7 4.6 4.3 6.1
4 2009-01-01 2.4 3.7 5.6 2.3 9.1
4 2009-01-02 3.4 5.7 7.6 3.3 5.1", header = T)
我有一个包含列 id、日期和其他 5 个变量的数据框。我想将数据帧转换为大小为 (#ids,#dates,5) 的 3d 矩阵。我知道我可以使用 dim 函数等,如果所有 id 在数据框中具有相同的行数。然而,事实并非如此。如何将非平衡(不确定这是否是正确的术语)数据帧转换为 3d 矩阵,每个 2d 矩阵对应一个 id 并具有维度(#dates,5)。重要的是每个 2d 矩阵的行数随 id 而变化。
我很不擅长处理矩阵。对此表示歉意。
id date x1 x2 x3 x4 x5
1: 1 2009-01-01 5 4 2 5.5 7
2: 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
3: 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
4: 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
5: 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
6: 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
7: 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
对于每个 id,我想创建一个 2d 矩阵和一个 3d 数组。我需要这种格式来将数据传递给 keras R 库。谢谢你。
此致,
这里有一个 tidyverse 选项:
library(tidyverse)
df <- data.frame(id = c(1L, 1L, 1L, 2L, 3L, 4L, 4L),
date = as.Date(c("2009-01-01", "2009-01-02", "2009-01-03", "2009-01-01", "2010-01-01", "2009-01-01", "2009-01-02")),
x1 = c(5, 5.4, 4.4, 12.4, 3.4, 2.4, 3.4),
x2 = c(4, 4.1, 2.1, 2.7, 1.7, 3.7, 5.7),
x3 = c(2, 2.2, 4.2, 4.9, 4.6, 5.6, 7.6),
x4 = c(5.5, 5.3, 6.3, 3.3, 4.3, 2.3, 3.3),
x5 = c(7, 7.1, 10.1, 2.1, 6.1, 9.1, 5.1))
a <- df %>%
complete(id, date, fill = map(df[3:7], ~0)) %>% # insert missing rows; fill with 0s
nest(-id) %>% # collapse other columns to list column of data frames
mutate(data = map(data, ~as.matrix(.x[-1]))) %>% # drop dates from nested data frames and coerce each to matrix
pull(data) %>% # extract matrix list
invoke(abind::abind, ., along = 3) %>% # abind in 3rd dimension
`dimnames<-`(list(as.character(unique(df$date)), names(df[3:7]), unique(df$id))) # set dimnames properly
a
#> , , 1
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 5.0 4.0 2.0 5.5 7.0
#> 2009-01-02 5.4 4.1 2.2 5.3 7.1
#> 2009-01-03 4.4 2.1 4.2 6.3 10.1
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 2
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 12.4 2.7 4.9 3.3 2.1
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
#>
#> , , 3
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 0.0 0.0 0.0 0.0 0.0
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 3.4 1.7 4.6 4.3 6.1
#>
#> , , 4
#>
#> x1 x2 x3 x4 x5
#> 2009-01-01 2.4 3.7 5.6 2.3 9.1
#> 2009-01-02 3.4 5.7 7.6 3.3 5.1
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0
不确定我是否理解您的预期输出,但我建议将您的 data.frame
拆分为 data.frame
的 list
,或者 nest
将您的数据用于每 id
.
选项 1:split
ting
split(df, df$id)
#$`1`
# id date x1 x2 x3 x4 x5
#1 1 2009-01-01 5.0 4.0 2.0 5.5 7.0
#2 1 2009-01-02 5.4 4.1 2.2 5.3 7.1
#3 1 2009-01-03 4.4 2.1 4.2 6.3 10.1
#
#$`2`
# id date x1 x2 x3 x4 x5
#4 2 2009-01-01 12.4 2.7 4.9 3.3 2.1
#
#$`3`
# id date x1 x2 x3 x4 x5
#5 3 2010-01-01 3.4 1.7 4.6 4.3 6.1
#
#$`4`
# id date x1 x2 x3 x4 x5
#6 4 2009-01-01 2.4 3.7 5.6 2.3 9.1
#7 4 2009-01-02 3.4 5.7 7.6 3.3 5.1
选项 2:nest
ing
library(tidyverse)
df %>%
group_by(id) %>%
nest()
## A tibble: 4 x 2
# id data
# <int> <list>
#1 1 <tibble [3 × 6]>
#2 2 <tibble [1 × 6]>
#3 3 <tibble [1 × 6]>
#4 4 <tibble [2 × 6]>
示例数据
df <- read.table(text =
" id date x1 x2 x3 x4 x5
1 2009-01-01 5 4 2 5.5 7
1 2009-01-02 5.4 4.1 2.2 5.3 7.1
1 2009-01-03 4.4 2.1 4.2 6.3 10.1
2 2009-01-01 12.4 2.7 4.9 3.3 2.1
3 2010-01-01 3.4 1.7 4.6 4.3 6.1
4 2009-01-01 2.4 3.7 5.6 2.3 9.1
4 2009-01-02 3.4 5.7 7.6 3.3 5.1", header = T)