如何将数据帧列表转换为单个动物园对象 R
How to transform a list of dataframes into one single zoo object R
我有一个数据框列表,我想将其转换为一个动物园对象。
列表示例:
> example
$A.N
# A tibble: 374 x 21
TIMESTAMP OPEN HIGH LOW CLOSE daily_return intraday_return RIC
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2004-04-27 00:00:00 19.6 19.9 19.3 19.4 0 -0.00997 A.N
2 2004-04-28 00:00:00 19.3 19.3 19.0 19.1 0 -0.0105 A.N
3 2004-04-29 00:00:00 19.0 19.1 18.4 18.7 0 -0.0124 A.N
4 2004-04-30 00:00:00 18.8 18.9 18.1 18.2 0 -0.0302 A.N
5 2004-05-03 00:00:00 18.2 18.6 18.1 18.4 0 0.00776 A.N
6 2004-05-04 00:00:00 18.5 18.5 17.5 18.0 0 -0.0262 A.N
7 2004-05-05 00:00:00 18.0 18.3 17.9 18.1 0 0.00337 A.N
8 2004-05-06 00:00:00 17.9 18.0 17.7 17.7 0 -0.00977 A.N
9 2004-05-07 00:00:00 17.7 18.0 17.6 17.7 0 0.00420 A.N
10 2004-05-10 00:00:00 17.4 17.5 16.9 17.1 0 -0.0170 A.N
# ... with 364 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
# Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
# Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
# Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
# Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
# End_Event_Study <date>
$ABI.BR
# A tibble: 375 x 21
TIMESTAMP OPEN HIGH LOW CLOSE daily_return intraday_return RIC
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2002-11-04 00:00:00 14.0 14.3 13.2 13.3 0 -0.0473 ABI.BR
2 2002-11-05 00:00:00 13.4 13.4 12.9 13.2 0 -0.0158 ABI.BR
3 2002-11-06 00:00:00 13.7 14.0 13.5 14.0 0 0.0256 ABI.BR
4 2002-11-07 00:00:00 14.0 14.4 13.7 13.7 0 -0.0192 ABI.BR
5 2002-11-08 00:00:00 13.9 13.9 13.3 13.4 0 -0.0311 ABI.BR
6 2002-11-11 00:00:00 13.4 14.0 13.4 13.9 0 0.0393 ABI.BR
7 2002-11-12 00:00:00 13.8 14.3 13.7 14.1 0 0.0181 ABI.BR
8 2002-11-13 00:00:00 13.8 13.9 13.5 13.7 0 -0.00950 ABI.BR
9 2002-11-14 00:00:00 13.7 13.9 13.3 13.4 0 -0.0228 ABI.BR
10 2002-11-15 00:00:00 13.6 13.7 13.4 13.6 0 -0.000459 ABI.BR
# ... with 365 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
# Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
# Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
# Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
# Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
# End_Event_Study <date>
因此,我需要提取的只是 TIMESTAMP 和 INTRADAY_RETURN。我可以用一个循环来做到这一点。为了进一步计算,我需要一个大型动物园对象,它应该看起来像这样:
head(StockPriceReturns,3) # Time series of dates and returns.
Bajaj.Auto BHEL Bharti.Airtel Cipla Coal.India Dr.Reddy
2010-07-01 0.5277396 -1.236944 0.51151007 -0.7578608 NA -0.8436534
2010-07-02 -1.7309383 -1.669938 0.09443763 0.4910359 NA -0.3687345
2010-07-05 -0.2530097 -1.282136 0.80850304 0.1335015 NA 1.7035363
(此示例来自 eventstudies 包)
时间戳和行数等在我的数据帧列表中有所不同。
关于如何做到这一点有什么建议吗?
假设最后的注释中显示的输入列表可重复显示,将组件组合在一起形成一个长数据框,然后提取出所需的列并使用 read.zoo
转换为 zoo。
read.zoo
的 aggregate=
参数提供了一个函数,用于聚合具有相同日期时间的值,以便在代码中每个日期时间只有一个。聚合参数的常用值为 aggregate=mean
或 aggregate=function(x) tail(x, 1)
。我们在下面展示第一个。对于 Note 中的数据,日期时间在代码中是唯一的,因此可以选择省略聚合参数,但如果它留在其中也不会受到伤害。
library(zoo)
DF <- do.call("rbind", L)[c("TIMESTAMP", "RIC", "intraday_return")]
z <- read.zoo(DF, split = "RIC", aggregate = mean); z
给予:
A.N ABI.BR
2002-11-04 NA -0.047300
2002-11-05 NA -0.015800
2002-11-06 NA 0.025600
2002-11-07 NA -0.019200
2002-11-08 NA -0.031100
2002-11-11 NA 0.039300
2002-11-12 NA 0.018100
2002-11-13 NA -0.009500
2002-11-14 NA -0.022800
2002-11-15 NA -0.000459
2004-04-27 -0.00997 NA
2004-04-28 -0.01050 NA
2004-04-29 -0.01240 NA
2004-04-30 -0.03020 NA
2004-05-03 0.00776 NA
2004-05-04 -0.02620 NA
2004-05-05 0.00337 NA
2004-05-06 -0.00977 NA
2004-05-07 0.00420 NA
2004-05-10 -0.01700 NA
备注
我们假设此输入列表以可重现的形式显示。
L <- list(A.N = structure(list(TIMESTAMP = structure(c(1083038400,
1083124800, 1083211200, 1083297600, 1083556800, 1083643200, 1083729600,
1083816000, 1083902400, 1084161600), class = c("POSIXct", "POSIXt"
), tzone = ""), OPEN = c(19.6, 19.3, 19, 18.8, 18.2, 18.5, 18,
17.9, 17.7, 17.4), HIGH = c(19.9, 19.3, 19.1, 18.9, 18.6, 18.5,
18.3, 18, 18, 17.5), LOW = c(19.3, 19, 18.4, 18.1, 18.1, 17.5,
17.9, 17.7, 17.6, 16.9), CLOSE = c(19.4, 19.1, 18.7, 18.2, 18.4,
18, 18.1, 17.7, 17.7, 17.1), daily_return = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.00997, -0.0105,
-0.0124, -0.0302, 0.00776, -0.0262, 0.00337, -0.00977, 0.0042,
-0.017), RIC = c("A.N", "A.N", "A.N", "A.N", "A.N", "A.N", "A.N",
"A.N", "A.N", "A.N")), row.names = c(NA, -10L), class = "data.frame"),
ABI.N = structure(list(TIMESTAMP = structure(c(1036386000,
1036472400, 1036558800, 1036645200, 1036731600, 1036990800,
1037077200, 1037163600, 1037250000, 1037336400), class = c("POSIXct",
"POSIXt"), tzone = ""), OPEN = c(14, 13.4, 13.7, 14, 13.9,
13.4, 13.8, 13.8, 13.7, 13.6), HIGH = c(14.3, 13.4, 14, 14.4,
13.9, 14, 14.3, 13.9, 13.9, 13.7), LOW = c(13.2, 12.9, 13.5,
13.7, 13.3, 13.4, 13.7, 13.5, 13.3, 13.4), CLOSE = c(13.3,
13.2, 14, 13.7, 13.4, 13.9, 14.1, 13.7, 13.4, 13.6), daily_return = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.0473,
-0.0158, 0.0256, -0.0192, -0.0311, 0.0393, 0.0181, -0.0095,
-0.0228, -0.000459), RIC = c("ABI.BR", "ABI.BR", "ABI.BR",
"ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR",
"ABI.BR")), row.names = c(NA, -10L), class = "data.frame"))
我有一个数据框列表,我想将其转换为一个动物园对象。
列表示例:
> example
$A.N
# A tibble: 374 x 21
TIMESTAMP OPEN HIGH LOW CLOSE daily_return intraday_return RIC
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2004-04-27 00:00:00 19.6 19.9 19.3 19.4 0 -0.00997 A.N
2 2004-04-28 00:00:00 19.3 19.3 19.0 19.1 0 -0.0105 A.N
3 2004-04-29 00:00:00 19.0 19.1 18.4 18.7 0 -0.0124 A.N
4 2004-04-30 00:00:00 18.8 18.9 18.1 18.2 0 -0.0302 A.N
5 2004-05-03 00:00:00 18.2 18.6 18.1 18.4 0 0.00776 A.N
6 2004-05-04 00:00:00 18.5 18.5 17.5 18.0 0 -0.0262 A.N
7 2004-05-05 00:00:00 18.0 18.3 17.9 18.1 0 0.00337 A.N
8 2004-05-06 00:00:00 17.9 18.0 17.7 17.7 0 -0.00977 A.N
9 2004-05-07 00:00:00 17.7 18.0 17.6 17.7 0 0.00420 A.N
10 2004-05-10 00:00:00 17.4 17.5 16.9 17.1 0 -0.0170 A.N
# ... with 364 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
# Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
# Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
# Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
# Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
# End_Event_Study <date>
$ABI.BR
# A tibble: 375 x 21
TIMESTAMP OPEN HIGH LOW CLOSE daily_return intraday_return RIC
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2002-11-04 00:00:00 14.0 14.3 13.2 13.3 0 -0.0473 ABI.BR
2 2002-11-05 00:00:00 13.4 13.4 12.9 13.2 0 -0.0158 ABI.BR
3 2002-11-06 00:00:00 13.7 14.0 13.5 14.0 0 0.0256 ABI.BR
4 2002-11-07 00:00:00 14.0 14.4 13.7 13.7 0 -0.0192 ABI.BR
5 2002-11-08 00:00:00 13.9 13.9 13.3 13.4 0 -0.0311 ABI.BR
6 2002-11-11 00:00:00 13.4 14.0 13.4 13.9 0 0.0393 ABI.BR
7 2002-11-12 00:00:00 13.8 14.3 13.7 14.1 0 0.0181 ABI.BR
8 2002-11-13 00:00:00 13.8 13.9 13.5 13.7 0 -0.00950 ABI.BR
9 2002-11-14 00:00:00 13.7 13.9 13.3 13.4 0 -0.0228 ABI.BR
10 2002-11-15 00:00:00 13.6 13.7 13.4 13.6 0 -0.000459 ABI.BR
# ... with 365 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
# Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
# Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
# Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
# Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
# End_Event_Study <date>
因此,我需要提取的只是 TIMESTAMP 和 INTRADAY_RETURN。我可以用一个循环来做到这一点。为了进一步计算,我需要一个大型动物园对象,它应该看起来像这样:
head(StockPriceReturns,3) # Time series of dates and returns.
Bajaj.Auto BHEL Bharti.Airtel Cipla Coal.India Dr.Reddy
2010-07-01 0.5277396 -1.236944 0.51151007 -0.7578608 NA -0.8436534
2010-07-02 -1.7309383 -1.669938 0.09443763 0.4910359 NA -0.3687345
2010-07-05 -0.2530097 -1.282136 0.80850304 0.1335015 NA 1.7035363
(此示例来自 eventstudies 包)
时间戳和行数等在我的数据帧列表中有所不同。
关于如何做到这一点有什么建议吗?
假设最后的注释中显示的输入列表可重复显示,将组件组合在一起形成一个长数据框,然后提取出所需的列并使用 read.zoo
转换为 zoo。
read.zoo
的 aggregate=
参数提供了一个函数,用于聚合具有相同日期时间的值,以便在代码中每个日期时间只有一个。聚合参数的常用值为 aggregate=mean
或 aggregate=function(x) tail(x, 1)
。我们在下面展示第一个。对于 Note 中的数据,日期时间在代码中是唯一的,因此可以选择省略聚合参数,但如果它留在其中也不会受到伤害。
library(zoo)
DF <- do.call("rbind", L)[c("TIMESTAMP", "RIC", "intraday_return")]
z <- read.zoo(DF, split = "RIC", aggregate = mean); z
给予:
A.N ABI.BR
2002-11-04 NA -0.047300
2002-11-05 NA -0.015800
2002-11-06 NA 0.025600
2002-11-07 NA -0.019200
2002-11-08 NA -0.031100
2002-11-11 NA 0.039300
2002-11-12 NA 0.018100
2002-11-13 NA -0.009500
2002-11-14 NA -0.022800
2002-11-15 NA -0.000459
2004-04-27 -0.00997 NA
2004-04-28 -0.01050 NA
2004-04-29 -0.01240 NA
2004-04-30 -0.03020 NA
2004-05-03 0.00776 NA
2004-05-04 -0.02620 NA
2004-05-05 0.00337 NA
2004-05-06 -0.00977 NA
2004-05-07 0.00420 NA
2004-05-10 -0.01700 NA
备注
我们假设此输入列表以可重现的形式显示。
L <- list(A.N = structure(list(TIMESTAMP = structure(c(1083038400,
1083124800, 1083211200, 1083297600, 1083556800, 1083643200, 1083729600,
1083816000, 1083902400, 1084161600), class = c("POSIXct", "POSIXt"
), tzone = ""), OPEN = c(19.6, 19.3, 19, 18.8, 18.2, 18.5, 18,
17.9, 17.7, 17.4), HIGH = c(19.9, 19.3, 19.1, 18.9, 18.6, 18.5,
18.3, 18, 18, 17.5), LOW = c(19.3, 19, 18.4, 18.1, 18.1, 17.5,
17.9, 17.7, 17.6, 16.9), CLOSE = c(19.4, 19.1, 18.7, 18.2, 18.4,
18, 18.1, 17.7, 17.7, 17.1), daily_return = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.00997, -0.0105,
-0.0124, -0.0302, 0.00776, -0.0262, 0.00337, -0.00977, 0.0042,
-0.017), RIC = c("A.N", "A.N", "A.N", "A.N", "A.N", "A.N", "A.N",
"A.N", "A.N", "A.N")), row.names = c(NA, -10L), class = "data.frame"),
ABI.N = structure(list(TIMESTAMP = structure(c(1036386000,
1036472400, 1036558800, 1036645200, 1036731600, 1036990800,
1037077200, 1037163600, 1037250000, 1037336400), class = c("POSIXct",
"POSIXt"), tzone = ""), OPEN = c(14, 13.4, 13.7, 14, 13.9,
13.4, 13.8, 13.8, 13.7, 13.6), HIGH = c(14.3, 13.4, 14, 14.4,
13.9, 14, 14.3, 13.9, 13.9, 13.7), LOW = c(13.2, 12.9, 13.5,
13.7, 13.3, 13.4, 13.7, 13.5, 13.3, 13.4), CLOSE = c(13.3,
13.2, 14, 13.7, 13.4, 13.9, 14.1, 13.7, 13.4, 13.6), daily_return = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.0473,
-0.0158, 0.0256, -0.0192, -0.0311, 0.0393, 0.0181, -0.0095,
-0.0228, -0.000459), RIC = c("ABI.BR", "ABI.BR", "ABI.BR",
"ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR",
"ABI.BR")), row.names = c(NA, -10L), class = "data.frame"))