将非平衡 2d 数据帧转换为 3d 数组

Converting a nonbalanced 2d dataframe to 3d array

我有一个包含列 id、日期和其他 5 个变量的数据框。我想将数据帧转换为大小为 (#ids,#dates,5) 的 3d 矩阵。我知道我可以使用 dim 函数等,如果所有 id 在数据框中具有相同的行数。然而,事实并非如此。如何将非平衡(不确定这是否是正确的术语)数据帧转换为 3d 矩阵,每个 2d 矩阵对应一个 id 并具有维度(#dates,5)。重要的是每个 2d 矩阵的行数随 id 而变化。

我很不擅长处理矩阵。对此表示歉意。

   id       date   x1  x2  x3  x4   x5
1:  1 2009-01-01    5   4   2 5.5    7
2:  1 2009-01-02  5.4 4.1 2.2 5.3  7.1
3:  1 2009-01-03  4.4 2.1 4.2 6.3 10.1
4:  2 2009-01-01 12.4 2.7 4.9 3.3  2.1
5:  3 2010-01-01  3.4 1.7 4.6 4.3  6.1
6:  4 2009-01-01  2.4 3.7 5.6 2.3  9.1
7:  4 2009-01-02  3.4 5.7 7.6 3.3  5.1

对于每个 id,我想创建一个 2d 矩阵和一个 3d 数组。我需要这种格式来将数据传递给 keras R 库。谢谢你。

此致,

这里有一个 tidyverse 选项:

library(tidyverse)

df <- data.frame(id = c(1L, 1L, 1L, 2L, 3L, 4L, 4L), 
                 date = as.Date(c("2009-01-01", "2009-01-02", "2009-01-03", "2009-01-01", "2010-01-01", "2009-01-01", "2009-01-02")), 
                 x1 = c(5, 5.4, 4.4, 12.4, 3.4, 2.4, 3.4), 
                 x2 = c(4, 4.1, 2.1, 2.7, 1.7, 3.7, 5.7), 
                 x3 = c(2, 2.2, 4.2, 4.9, 4.6, 5.6, 7.6), 
                 x4 = c(5.5, 5.3, 6.3, 3.3, 4.3, 2.3, 3.3), 
                 x5 = c(7, 7.1, 10.1, 2.1, 6.1, 9.1, 5.1))

a <- df %>% 
    complete(id, date, fill = map(df[3:7], ~0)) %>%    # insert missing rows; fill with 0s
    nest(-id) %>%    # collapse other columns to list column of data frames
    mutate(data = map(data, ~as.matrix(.x[-1]))) %>%    # drop dates from nested data frames and coerce each to matrix
    pull(data) %>%    # extract matrix list
    invoke(abind::abind, ., along = 3) %>%    # abind in 3rd dimension
    `dimnames<-`(list(as.character(unique(df$date)), names(df[3:7]), unique(df$id)))    # set dimnames properly

a
#> , , 1
#> 
#>             x1  x2  x3  x4   x5
#> 2009-01-01 5.0 4.0 2.0 5.5  7.0
#> 2009-01-02 5.4 4.1 2.2 5.3  7.1
#> 2009-01-03 4.4 2.1 4.2 6.3 10.1
#> 2010-01-01 0.0 0.0 0.0 0.0  0.0
#> 
#> , , 2
#> 
#>              x1  x2  x3  x4  x5
#> 2009-01-01 12.4 2.7 4.9 3.3 2.1
#> 2009-01-02  0.0 0.0 0.0 0.0 0.0
#> 2009-01-03  0.0 0.0 0.0 0.0 0.0
#> 2010-01-01  0.0 0.0 0.0 0.0 0.0
#> 
#> , , 3
#> 
#>             x1  x2  x3  x4  x5
#> 2009-01-01 0.0 0.0 0.0 0.0 0.0
#> 2009-01-02 0.0 0.0 0.0 0.0 0.0
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 3.4 1.7 4.6 4.3 6.1
#> 
#> , , 4
#> 
#>             x1  x2  x3  x4  x5
#> 2009-01-01 2.4 3.7 5.6 2.3 9.1
#> 2009-01-02 3.4 5.7 7.6 3.3 5.1
#> 2009-01-03 0.0 0.0 0.0 0.0 0.0
#> 2010-01-01 0.0 0.0 0.0 0.0 0.0

不确定我是否理解您的预期输出,但我建议将您的 data.frame 拆分为 data.framelist,或者 nest 将您的数据用于每 id.

选项 1:splitting

split(df, df$id)
#$`1`
#  id       date  x1  x2  x3  x4   x5
#1  1 2009-01-01 5.0 4.0 2.0 5.5  7.0
#2  1 2009-01-02 5.4 4.1 2.2 5.3  7.1
#3  1 2009-01-03 4.4 2.1 4.2 6.3 10.1
#
#$`2`
#  id       date   x1  x2  x3  x4  x5
#4  2 2009-01-01 12.4 2.7 4.9 3.3 2.1
#
#$`3`
#  id       date  x1  x2  x3  x4  x5
#5  3 2010-01-01 3.4 1.7 4.6 4.3 6.1
#
#$`4`
#  id       date  x1  x2  x3  x4  x5
#6  4 2009-01-01 2.4 3.7 5.6 2.3 9.1
#7  4 2009-01-02 3.4 5.7 7.6 3.3 5.1

选项 2:nesting

library(tidyverse)
df %>%
    group_by(id) %>%
    nest()
## A tibble: 4 x 2
#     id data
#  <int> <list>
#1     1 <tibble [3 × 6]>
#2     2 <tibble [1 × 6]>
#3     3 <tibble [1 × 6]>
#4     4 <tibble [2 × 6]>

示例数据

df <- read.table(text =
    "   id       date   x1  x2  x3  x4   x5
1 2009-01-01    5   4   2 5.5    7
1 2009-01-02  5.4 4.1 2.2 5.3  7.1
1 2009-01-03  4.4 2.1 4.2 6.3 10.1
2 2009-01-01 12.4 2.7 4.9 3.3  2.1
3 2010-01-01  3.4 1.7 4.6 4.3  6.1
4 2009-01-01  2.4 3.7 5.6 2.3  9.1
4 2009-01-02  3.4 5.7 7.6 3.3  5.1", header = T)