在 R 中创建面板数据(横截面尺寸包含重复的实体)
create panel data in R (cross section dimension holds repeated entities)
我的数据如下所示:
>loan data
ID loan_start_date loan_maturity_date feb13 march13 april13........
1 2016-01-03 2017-01-03 46 45 44
1 2011-01-08 2013-01-08 NA NA NA
1 2013-02-13 2015-02-13 23 22 21
2 2012-02-03 2016-05-03 38 37 36
2 2013-05-08 2014-01-09 10 09 08
2 2011-03-13 2013-02-18 0 NA NA
3 2015-07-03 2016-01-08 34 33 32
3 2013-01-09 2015-07-08 28 27 26
我可以从中创建面板数据吗?如果是,我怎么能在 R 中做到这一点?在面板数据中,横截面维度是 ID,时间维度是 feb13、march13、april13(从该特定月份到贷款到期日的时间)......持续 48 个月。我看过其他创建面板数据的示例,但在这些示例中,每个 ID 只占一行,现在每个 ID 都位于多行中。所以我很困惑如何从中创建面板。非常感谢您的帮助。
编辑:
如果我是正确的,预期结果应该是这样的:
>loan data
ID months time to maturity
1 feb13 46
1 march13 45
1 april13 44
.
.
.
1 jan17 0
1 feb13 NA
1 march13 NA
1 april13 NA
.
.
.
1 jan17 NA
1 feb13 23
1 march13 22
1 april13 21
.
.
.
1 jan17 NA
2 feb13 38
2 march13 37
2 april13 36
.
.
.
2 jan17 NA
2 feb13 10
2 march13 09
2 april13 08
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
3 feb13 34
3 march13 33
3 april13 32
.
.
.
3 jan17 NA
3 feb13 28
3 march13 27
3 april13 26
.
.
.
3 jan17 NA
正如评论所说,看来你想要的都能找到here。应用收益率
dt <- reshape2::melt(df, id.vars = 'ID')
head(dt)
ID variable value
1 2 feb17 40
2 4 feb17 33
3 3 feb17 35
4 5 feb17 34
5 5 feb17 NA
6 1 feb17 38
这里是这个例子使用的数据
set.seed(123)
df <- data.frame(ID = sample(1:5, 10, replace = TRUE),
feb17 = sample(c(NA,30:40), 10),
mar17 = sample(c(NA,30:40), 10),
apr17 = sample(c(NA,30:40), 10),
feb18 = sample(c(NA,30:40), 10),
mar18 = sample(c(NA,30:40), 10),
apr18 = sample(c(NA,30:40), 10)
)
> head(df)
ID feb17 mar17 apr17 feb18 mar18 apr18
1 2 40 39 40 30 NA 36
2 4 33 36 38 33 33 30
3 3 35 35 35 39 36 32
4 5 34 37 36 32 30 31
5 5 NA 34 NA 40 39 35
6 1 38 33 32 NA 37 38
另一种选择是使用 tidyr
包中的 gather()
函数。
此函数将多个列转换为单个键值对。您需要指定数据框、新 "key" 列的名称、新 "value" 列的名称,然后指定要收集哪些列。如果(就像在这种情况下)要包含的列多于排除的列,您可以简单地指定要排除的列:
library(tidyr)
gather(data, key = "month_year", value = "months_to_maturity", -(ID:loan_maturity_date))
#> ID start_date loan_maturity_date month_year months_to_maturity
#> 1 1 2016-01-03 2017-01-03 feb13 46
#> 2 1 2011-01-08 2013-01-08 feb13 NA
#> 3 1 2013-02-13 2015-02-13 feb13 23
#> 4 2 2012-02-03 2016-05-03 feb13 38
#> 5 2 2013-05-08 2014-01-09 feb13 10
#> 6 2 2011-03-13 2013-02-18 feb13 0
#> 7 3 2015-07-03 2016-01-08 feb13 34
#> 8 3 2013-01-09 2015-07-08 feb13 28
#> 9 1 2016-01-03 2017-01-03 march13 45
#> 10 1 2011-01-08 2013-01-08 march13 NA
#> 11 1 2013-02-13 2015-02-13 march13 22
#> 12 2 2012-02-03 2016-05-03 march13 37
#> 13 2 2013-05-08 2014-01-09 march13 9
#> 14 2 2011-03-13 2013-02-18 march13 NA
#> 15 3 2015-07-03 2016-01-08 march13 33
#> 16 3 2013-01-09 2015-07-08 march13 27
#> 17 1 2016-01-03 2017-01-03 april13 44
#> 18 1 2011-01-08 2013-01-08 april13 NA
#> 19 1 2013-02-13 2015-02-13 april13 21
#> 20 2 2012-02-03 2016-05-03 april13 36
#> 21 2 2013-05-08 2014-01-09 april13 8
#> 22 2 2011-03-13 2013-02-18 april13 NA
#> 23 3 2015-07-03 2016-01-08 april13 32
#> 24 3 2013-01-09 2015-07-08 april13 26
以及用于此的数据:
df <-
data.frame(ID = c(1,1,1,2,2,2,3,3),
start_date = c("2016-01-03",
"2011-01-08",
"2013-02-13",
"2012-02-03",
"2013-05-08",
"2011-03-13",
"2015-07-03",
"2013-01-09"),
loan_maturity_date = c("2017-01-03",
"2013-01-08",
"2015-02-13",
"2016-05-03",
"2014-01-09",
"2013-02-18",
"2016-01-08",
"2015-07-08"),
feb13 = c(46,
NA,
23,
38,
10,
0 ,
34,
28),
march13 = c(45,
NA,
22,
37,
09,
NA,
33,
27),
april13 = c(44,
NA,
21,
36,
08,
NA,
32,
26))
我的数据如下所示:
>loan data
ID loan_start_date loan_maturity_date feb13 march13 april13........
1 2016-01-03 2017-01-03 46 45 44
1 2011-01-08 2013-01-08 NA NA NA
1 2013-02-13 2015-02-13 23 22 21
2 2012-02-03 2016-05-03 38 37 36
2 2013-05-08 2014-01-09 10 09 08
2 2011-03-13 2013-02-18 0 NA NA
3 2015-07-03 2016-01-08 34 33 32
3 2013-01-09 2015-07-08 28 27 26
我可以从中创建面板数据吗?如果是,我怎么能在 R 中做到这一点?在面板数据中,横截面维度是 ID,时间维度是 feb13、march13、april13(从该特定月份到贷款到期日的时间)......持续 48 个月。我看过其他创建面板数据的示例,但在这些示例中,每个 ID 只占一行,现在每个 ID 都位于多行中。所以我很困惑如何从中创建面板。非常感谢您的帮助。
编辑: 如果我是正确的,预期结果应该是这样的:
>loan data
ID months time to maturity
1 feb13 46
1 march13 45
1 april13 44
.
.
.
1 jan17 0
1 feb13 NA
1 march13 NA
1 april13 NA
.
.
.
1 jan17 NA
1 feb13 23
1 march13 22
1 april13 21
.
.
.
1 jan17 NA
2 feb13 38
2 march13 37
2 april13 36
.
.
.
2 jan17 NA
2 feb13 10
2 march13 09
2 april13 08
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
2 feb13 0
2 march13 NA
2 april13 NA
.
.
.
2 jan17 NA
3 feb13 34
3 march13 33
3 april13 32
.
.
.
3 jan17 NA
3 feb13 28
3 march13 27
3 april13 26
.
.
.
3 jan17 NA
正如评论所说,看来你想要的都能找到here。应用收益率
dt <- reshape2::melt(df, id.vars = 'ID')
head(dt)
ID variable value
1 2 feb17 40
2 4 feb17 33
3 3 feb17 35
4 5 feb17 34
5 5 feb17 NA
6 1 feb17 38
这里是这个例子使用的数据
set.seed(123)
df <- data.frame(ID = sample(1:5, 10, replace = TRUE),
feb17 = sample(c(NA,30:40), 10),
mar17 = sample(c(NA,30:40), 10),
apr17 = sample(c(NA,30:40), 10),
feb18 = sample(c(NA,30:40), 10),
mar18 = sample(c(NA,30:40), 10),
apr18 = sample(c(NA,30:40), 10)
)
> head(df)
ID feb17 mar17 apr17 feb18 mar18 apr18
1 2 40 39 40 30 NA 36
2 4 33 36 38 33 33 30
3 3 35 35 35 39 36 32
4 5 34 37 36 32 30 31
5 5 NA 34 NA 40 39 35
6 1 38 33 32 NA 37 38
另一种选择是使用 tidyr
包中的 gather()
函数。
此函数将多个列转换为单个键值对。您需要指定数据框、新 "key" 列的名称、新 "value" 列的名称,然后指定要收集哪些列。如果(就像在这种情况下)要包含的列多于排除的列,您可以简单地指定要排除的列:
library(tidyr)
gather(data, key = "month_year", value = "months_to_maturity", -(ID:loan_maturity_date))
#> ID start_date loan_maturity_date month_year months_to_maturity
#> 1 1 2016-01-03 2017-01-03 feb13 46
#> 2 1 2011-01-08 2013-01-08 feb13 NA
#> 3 1 2013-02-13 2015-02-13 feb13 23
#> 4 2 2012-02-03 2016-05-03 feb13 38
#> 5 2 2013-05-08 2014-01-09 feb13 10
#> 6 2 2011-03-13 2013-02-18 feb13 0
#> 7 3 2015-07-03 2016-01-08 feb13 34
#> 8 3 2013-01-09 2015-07-08 feb13 28
#> 9 1 2016-01-03 2017-01-03 march13 45
#> 10 1 2011-01-08 2013-01-08 march13 NA
#> 11 1 2013-02-13 2015-02-13 march13 22
#> 12 2 2012-02-03 2016-05-03 march13 37
#> 13 2 2013-05-08 2014-01-09 march13 9
#> 14 2 2011-03-13 2013-02-18 march13 NA
#> 15 3 2015-07-03 2016-01-08 march13 33
#> 16 3 2013-01-09 2015-07-08 march13 27
#> 17 1 2016-01-03 2017-01-03 april13 44
#> 18 1 2011-01-08 2013-01-08 april13 NA
#> 19 1 2013-02-13 2015-02-13 april13 21
#> 20 2 2012-02-03 2016-05-03 april13 36
#> 21 2 2013-05-08 2014-01-09 april13 8
#> 22 2 2011-03-13 2013-02-18 april13 NA
#> 23 3 2015-07-03 2016-01-08 april13 32
#> 24 3 2013-01-09 2015-07-08 april13 26
以及用于此的数据:
df <-
data.frame(ID = c(1,1,1,2,2,2,3,3),
start_date = c("2016-01-03",
"2011-01-08",
"2013-02-13",
"2012-02-03",
"2013-05-08",
"2011-03-13",
"2015-07-03",
"2013-01-09"),
loan_maturity_date = c("2017-01-03",
"2013-01-08",
"2015-02-13",
"2016-05-03",
"2014-01-09",
"2013-02-18",
"2016-01-08",
"2015-07-08"),
feb13 = c(46,
NA,
23,
38,
10,
0 ,
34,
28),
march13 = c(45,
NA,
22,
37,
09,
NA,
33,
27),
april13 = c(44,
NA,
21,
36,
08,
NA,
32,
26))