在 R 中创建面板数据(横截面尺寸包含重复的实体)

create panel data in R (cross section dimension holds repeated entities)

我的数据如下所示:

>loan data
           ID      loan_start_date    loan_maturity_date  feb13  march13 april13........
            1      2016-01-03         2017-01-03          46       45     44
            1      2011-01-08         2013-01-08          NA       NA     NA   
            1      2013-02-13         2015-02-13          23       22     21
            2      2012-02-03         2016-05-03          38       37     36
            2      2013-05-08         2014-01-09          10       09     08   
            2      2011-03-13         2013-02-18          0        NA     NA
            3      2015-07-03         2016-01-08          34       33     32
            3      2013-01-09         2015-07-08          28       27     26   

我可以从中创建面板数据吗?如果是,我怎么能在 R 中做到这一点?在面板数据中,横截面维度是 ID,时间维度是 feb13、march13、april13(从该特定月份到贷款到期日的时间)......持续 48 个月。我看过其他创建面板数据的示例,但在这些示例中,每个 ID 只占一行,现在每个 ID 都位于多行中。所以我很困惑如何从中创建面板。非常感谢您的帮助。

编辑: 如果我是正确的,预期结果应该是这样的:

>loan data
           ID      months    time to maturity  
            1      feb13         46          
            1      march13       45            
            1      april13       44          
                      .
                      .
                      .
            1      jan17          0          
            1      feb13         NA          
            1      march13       NA            
            1      april13       NA          
                      .
                      .
                      .
            1      jan17         NA          
            1      feb13         23          
            1      march13       22             
            1      april13       21          
                      .
                      .
                      .
            1      jan17         NA        
            2      feb13         38          
            2      march13       37            
            2      april13       36         
                     .
                     .
                     .
            2      jan17         NA 
            2      feb13         10          
            2      march13       09            
            2      april13       08         
                     .
                     .
                     .
            2      jan17         NA 
            2      feb13         0          
            2      march13       NA            
            2      april13       NA         
                     .
                     .
                     .
            2      jan17         NA 
            2      feb13         0          
            2      march13       NA            
            2      april13       NA         
                     .
                     .
                     .
            2      jan17         NA 
            3      feb13         34          
            3      march13       33            
            3      april13       32         
                     .
                     .
                     .
            3      jan17         NA 
            3      feb13         28          
            3      march13       27            
            3      april13       26         
                     .
                     .
                     .
            3      jan17         NA 

正如评论所说,看来你想要的都能找到here。应用收益率

dt <- reshape2::melt(df, id.vars = 'ID')
head(dt)
  ID variable value
1  2    feb17    40
2  4    feb17    33
3  3    feb17    35
4  5    feb17    34
5  5    feb17    NA
6  1    feb17    38

这里是这个例子使用的数据

set.seed(123)
df <- data.frame(ID = sample(1:5, 10, replace = TRUE), 
                 feb17 = sample(c(NA,30:40), 10),
                 mar17 = sample(c(NA,30:40), 10),
                 apr17 = sample(c(NA,30:40), 10),
                 feb18 = sample(c(NA,30:40), 10),
                 mar18 = sample(c(NA,30:40), 10),
                 apr18 = sample(c(NA,30:40), 10)
                )
> head(df)
  ID feb17 mar17 apr17 feb18 mar18 apr18
1  2    40    39    40    30    NA    36
2  4    33    36    38    33    33    30
3  3    35    35    35    39    36    32
4  5    34    37    36    32    30    31
5  5    NA    34    NA    40    39    35
6  1    38    33    32    NA    37    38

另一种选择是使用 tidyr 包中的 gather() 函数。

此函数将多个列转换为单个键值对。您需要指定数据框、新 "key" 列的名称、新 "value" 列的名称,然后指定要收集哪些列。如果(就像在这种情况下)要包含的列多于排除的列,您可以简单地指定要排除的列:

library(tidyr)

gather(data, key = "month_year", value = "months_to_maturity", -(ID:loan_maturity_date))

#>    ID start_date loan_maturity_date month_year months_to_maturity
#> 1   1 2016-01-03         2017-01-03      feb13                 46
#> 2   1 2011-01-08         2013-01-08      feb13                 NA
#> 3   1 2013-02-13         2015-02-13      feb13                 23
#> 4   2 2012-02-03         2016-05-03      feb13                 38
#> 5   2 2013-05-08         2014-01-09      feb13                 10
#> 6   2 2011-03-13         2013-02-18      feb13                  0
#> 7   3 2015-07-03         2016-01-08      feb13                 34
#> 8   3 2013-01-09         2015-07-08      feb13                 28
#> 9   1 2016-01-03         2017-01-03    march13                 45
#> 10  1 2011-01-08         2013-01-08    march13                 NA
#> 11  1 2013-02-13         2015-02-13    march13                 22
#> 12  2 2012-02-03         2016-05-03    march13                 37
#> 13  2 2013-05-08         2014-01-09    march13                  9
#> 14  2 2011-03-13         2013-02-18    march13                 NA
#> 15  3 2015-07-03         2016-01-08    march13                 33
#> 16  3 2013-01-09         2015-07-08    march13                 27
#> 17  1 2016-01-03         2017-01-03    april13                 44
#> 18  1 2011-01-08         2013-01-08    april13                 NA
#> 19  1 2013-02-13         2015-02-13    april13                 21
#> 20  2 2012-02-03         2016-05-03    april13                 36
#> 21  2 2013-05-08         2014-01-09    april13                  8
#> 22  2 2011-03-13         2013-02-18    april13                 NA
#> 23  3 2015-07-03         2016-01-08    april13                 32
#> 24  3 2013-01-09         2015-07-08    april13                 26

以及用于此的数据:

df <- 
  data.frame(ID                 = c(1,1,1,2,2,2,3,3),
             start_date         = c("2016-01-03",
                                    "2011-01-08",
                                    "2013-02-13",
                                    "2012-02-03",
                                    "2013-05-08",
                                    "2011-03-13",
                                    "2015-07-03",
                                    "2013-01-09"),
             loan_maturity_date = c("2017-01-03",
                                    "2013-01-08",
                                    "2015-02-13",
                                    "2016-05-03",
                                    "2014-01-09",
                                    "2013-02-18",
                                    "2016-01-08",
                                    "2015-07-08"),
             feb13              = c(46,
                                    NA,
                                    23,
                                    38,
                                    10,
                                    0 ,
                                    34,
                                    28),
             march13            = c(45,
                                    NA,
                                    22,
                                    37,
                                    09,
                                    NA,
                                    33,
                                    27),
             april13            = c(44,
                                    NA,
                                    21,
                                    36,
                                    08,
                                    NA,
                                    32,
                                    26))