如何按行名将列表中的数据重新组织到新的数据框中? R
How to reorganize data from a list by rownames into new dataframes? R
我有多个 xts 对象存储在一个列表中,每个对象都有 1000 多行。它们代表库存滚动 window 回归数据。每个元素都有其唯一的 Ticker 名称。出于测试目的,此处将它们称为 Stock1、2 ...等。行按日期命名,xts 格式也是如此。每个元素的尺寸都相等。每个看起来像这样:
> tail(testlist$Stock1, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934
2014-12-30 0.0002631315 1.002356 -0.02351438 -0.3465390 0.05954400 1.118506
2014-12-31 0.0002837304 1.000084 -0.01619536 -0.3494401 0.06121434 1.124845
> tail(testlist$Stock2, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570
2014-12-30 0.0003051495 0.7409709 -0.1899856 -0.07461764 -0.2240448 0.7921883
2014-12-31 0.0002614874 0.7478099 -0.1833077 -0.06197362 -0.2056615 0.7550211
> tail(testlist$Stock3, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360
2014-12-30 -0.0004121386 0.8352243 -0.4224404 0.7405976 -0.8114066 -0.2790438
2014-12-31 -0.0004660716 0.8355641 -0.4343012 0.7571033 -0.8057412 -0.3026019
> tail(testlist$Stock4, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542
2014-12-30 -0.0007734696 0.9383387 -0.08941983 0.011685507 -0.1092656 0.7863335
2014-12-31 -0.0007591168 0.9391670 -0.08782070 0.015619229 -0.1083707 0.7924232
我需要做什么:
按名称合并行并聚合列表中的所有数据以获得一组新数据。每个应该如下所示:
Name Date (Intercept) rmrf smb hml rmw cma
Stock1 2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934
Stock2 2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570
Stock3 2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360
Stock4 2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542
每个这样的元素都不应再是时间序列。但是一个静态的,每只股票代表它在时间 "t" 的系数值。就大小而言,每个元素的行数应等于原始列表中的股票数。
编辑 如 josilber
所问
> dput(list(Stock1=tail(testlist$Stock1, n = 3), Stock2=tail(testlist$Stock2, n = 3)))
structure(list(Stock1 = structure(c(0.000322317700198485, 0.000263131488679374,
0.000283730373928844, 1.01021497011709, 1.00235580055438, 1.00008407331697,
-0.0216484434660844, -0.023514378867335, -0.0161953614672028,
-0.332250031553704, -0.346538978804535, -0.349440052163927, 0.078195628743663,
0.0595439997647003, 0.0612143446991752, 1.1069343396633, 1.11850626745067,
1.12484530131584), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",
"rmw", "cma"))), Stock2 = structure(c(0.000330895099805035, 0.000305149500450527,
0.000261487411574969, 0.750381906747217, 0.740970893865186, 0.747809929767095,
-0.1967254672836, -0.189985607343021, -0.183307667378927, -0.10242615734439,
-0.0746176364711423, -0.0619736225998069, -0.226491384004977,
-0.224044849587752, -0.205661480898329, 0.832956994676299, 0.792188348360969,
0.755021100668421), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",
"rmw", "cma")))), .Names = c("Stock1", "Stock2"))
我完全蒙在鼓里。我查看了一些可能会用到的函数:lapply
/ merge
函数似乎也适用,但它只适用于 2 个元素。
我将在搜索答案时继续更新此 post。如果有人有任何线索或以前做过这个并且可以指出正确的方向,谢谢!
编辑
#Flatten data add one more name and put into one data frame
all_coef_data<- do.call(rbind,Map(cbind,
Name=names(testlist),
Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
lapply(testlist, as.data.frame)
))
我需要输出的每一行中的一个共同点是日期。我现在使用它按日期拆分数据框。输出是一个列表。
out <- split(all_coef_data , f = all_coef_data$Date )
输出:
> head(out$'2011-05-23', n=3)
Name Date (Intercept) rmrf smb hml rmw cma
Stock1.2011-05-23 Stock1 2011-05-23 -4.376389e-04 1.103582 -0.21747611 -0.1879211 -0.05849794 -0.1949192
Stock2.2011-05-23 Stock2 2011-05-23 1.115140e-04 1.198622 0.05422819 0.9998529 0.92141407 -0.8565260
Stock3.2011-05-23 Stock3 2011-05-23 5.457214e-05 1.303025 0.04705294 0.6897673 -0.19708983 -0.8247877
> tail(out$'2011-05-23', n=3)
Name Date (Intercept) rmrf smb hml rmw cma
Stock48.2011-05-23 Stock48 2011-05-23 0.0007354997 0.505054 0.1774544 -0.38934089 0.71775909 0.5189329
Stock49.2011-05-23 Stock49 2011-05-23 0.0004224351 1.304719 0.4511903 -0.64937062 -0.08872941 0.1545058
Stock50.2011-05-23 Stock50 2011-05-23 0.0003851261 1.020434 -0.1107910 -0.03964192 0.09526658 -0.4961902
听起来你想要
- 使用
as.data.frame()
、 将 xts 对象展平为 data.frames
cbind()
新列 Name
(来自列表组件名称)和 Date
(来自 xts 行名称,实际上来自 index
属性),最后
rbind()
所有内容合为一个 data.frame.
do.call(rbind,Map(cbind,
Name=names(testlist),
Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
lapply(testlist,as.data.frame)
));
## Name Date (Intercept) rmrf smb hml rmw cma
## Stock1.2014-12-29 Stock1 2014-12-29 0.0003223177 1.0102150 -0.02164844 -0.33225003 0.07819563 1.1069343
## Stock1.2014-12-30 Stock1 2014-12-30 0.0002631315 1.0023558 -0.02351438 -0.34653898 0.05954400 1.1185063
## Stock1.2014-12-31 Stock1 2014-12-31 0.0002837304 1.0000841 -0.01619536 -0.34944005 0.06121434 1.1248453
## Stock2.2014-12-29 Stock2 2014-12-29 0.0003308951 0.7503819 -0.19672547 -0.10242616 -0.22649138 0.8329570
## Stock2.2014-12-30 Stock2 2014-12-30 0.0003051495 0.7409709 -0.18998561 -0.07461764 -0.22404485 0.7921883
## Stock2.2014-12-31 Stock2 2014-12-31 0.0002614874 0.7478099 -0.18330767 -0.06197362 -0.20566148 0.7550211
如果您不喜欢新的行名,可以将行换行在 `rownames<-`(...,NULL)
.
这是另一个解决方案。
library(xts)
library(magrittr)
library(dplyr)
stock1 =
data.frame(a = c(1, 2), b = c(1, 2) ) %>%
xts(order.by =
c("2014-12-29", "2014-12-30") %>%
as.Date)
stock2 =
data.frame(a = c(1, 2), b = c(1, 2) ) %>%
xts(order.by =
c("2014-12-29", "2014-12-30") %>%
as.Date)
test = list(stock1 = stock1, stock2 = stock2)
result =
1:length(test) %>%
lapply(function(i)
test[[i]] %>%
as.data.frame %>%
mutate(name = names(test)[i],
date = rownames(.) %>% as.Date ) ) %>%
bind_rows
我有多个 xts 对象存储在一个列表中,每个对象都有 1000 多行。它们代表库存滚动 window 回归数据。每个元素都有其唯一的 Ticker 名称。出于测试目的,此处将它们称为 Stock1、2 ...等。行按日期命名,xts 格式也是如此。每个元素的尺寸都相等。每个看起来像这样:
> tail(testlist$Stock1, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934
2014-12-30 0.0002631315 1.002356 -0.02351438 -0.3465390 0.05954400 1.118506
2014-12-31 0.0002837304 1.000084 -0.01619536 -0.3494401 0.06121434 1.124845
> tail(testlist$Stock2, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570
2014-12-30 0.0003051495 0.7409709 -0.1899856 -0.07461764 -0.2240448 0.7921883
2014-12-31 0.0002614874 0.7478099 -0.1833077 -0.06197362 -0.2056615 0.7550211
> tail(testlist$Stock3, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360
2014-12-30 -0.0004121386 0.8352243 -0.4224404 0.7405976 -0.8114066 -0.2790438
2014-12-31 -0.0004660716 0.8355641 -0.4343012 0.7571033 -0.8057412 -0.3026019
> tail(testlist$Stock4, n = 3)
(Intercept) rmrf smb hml rmw cma
2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542
2014-12-30 -0.0007734696 0.9383387 -0.08941983 0.011685507 -0.1092656 0.7863335
2014-12-31 -0.0007591168 0.9391670 -0.08782070 0.015619229 -0.1083707 0.7924232
我需要做什么: 按名称合并行并聚合列表中的所有数据以获得一组新数据。每个应该如下所示:
Name Date (Intercept) rmrf smb hml rmw cma
Stock1 2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934
Stock2 2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570
Stock3 2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360
Stock4 2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542
每个这样的元素都不应再是时间序列。但是一个静态的,每只股票代表它在时间 "t" 的系数值。就大小而言,每个元素的行数应等于原始列表中的股票数。
编辑 如 josilber
所问 > dput(list(Stock1=tail(testlist$Stock1, n = 3), Stock2=tail(testlist$Stock2, n = 3)))
structure(list(Stock1 = structure(c(0.000322317700198485, 0.000263131488679374,
0.000283730373928844, 1.01021497011709, 1.00235580055438, 1.00008407331697,
-0.0216484434660844, -0.023514378867335, -0.0161953614672028,
-0.332250031553704, -0.346538978804535, -0.349440052163927, 0.078195628743663,
0.0595439997647003, 0.0612143446991752, 1.1069343396633, 1.11850626745067,
1.12484530131584), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",
"rmw", "cma"))), Stock2 = structure(c(0.000330895099805035, 0.000305149500450527,
0.000261487411574969, 0.750381906747217, 0.740970893865186, 0.747809929767095,
-0.1967254672836, -0.189985607343021, -0.183307667378927, -0.10242615734439,
-0.0746176364711423, -0.0619736225998069, -0.226491384004977,
-0.224044849587752, -0.205661480898329, 0.832956994676299, 0.792188348360969,
0.755021100668421), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",
"rmw", "cma")))), .Names = c("Stock1", "Stock2"))
我完全蒙在鼓里。我查看了一些可能会用到的函数:lapply
/ merge
函数似乎也适用,但它只适用于 2 个元素。
我将在搜索答案时继续更新此 post。如果有人有任何线索或以前做过这个并且可以指出正确的方向,谢谢!
编辑
#Flatten data add one more name and put into one data frame
all_coef_data<- do.call(rbind,Map(cbind,
Name=names(testlist),
Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
lapply(testlist, as.data.frame)
))
我需要输出的每一行中的一个共同点是日期。我现在使用它按日期拆分数据框。输出是一个列表。
out <- split(all_coef_data , f = all_coef_data$Date )
输出:
> head(out$'2011-05-23', n=3)
Name Date (Intercept) rmrf smb hml rmw cma
Stock1.2011-05-23 Stock1 2011-05-23 -4.376389e-04 1.103582 -0.21747611 -0.1879211 -0.05849794 -0.1949192
Stock2.2011-05-23 Stock2 2011-05-23 1.115140e-04 1.198622 0.05422819 0.9998529 0.92141407 -0.8565260
Stock3.2011-05-23 Stock3 2011-05-23 5.457214e-05 1.303025 0.04705294 0.6897673 -0.19708983 -0.8247877
> tail(out$'2011-05-23', n=3)
Name Date (Intercept) rmrf smb hml rmw cma
Stock48.2011-05-23 Stock48 2011-05-23 0.0007354997 0.505054 0.1774544 -0.38934089 0.71775909 0.5189329
Stock49.2011-05-23 Stock49 2011-05-23 0.0004224351 1.304719 0.4511903 -0.64937062 -0.08872941 0.1545058
Stock50.2011-05-23 Stock50 2011-05-23 0.0003851261 1.020434 -0.1107910 -0.03964192 0.09526658 -0.4961902
听起来你想要
- 使用
as.data.frame()
、 将 xts 对象展平为 data.frames
cbind()
新列Name
(来自列表组件名称)和Date
(来自 xts 行名称,实际上来自index
属性),最后rbind()
所有内容合为一个 data.frame.
do.call(rbind,Map(cbind,
Name=names(testlist),
Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
lapply(testlist,as.data.frame)
));
## Name Date (Intercept) rmrf smb hml rmw cma
## Stock1.2014-12-29 Stock1 2014-12-29 0.0003223177 1.0102150 -0.02164844 -0.33225003 0.07819563 1.1069343
## Stock1.2014-12-30 Stock1 2014-12-30 0.0002631315 1.0023558 -0.02351438 -0.34653898 0.05954400 1.1185063
## Stock1.2014-12-31 Stock1 2014-12-31 0.0002837304 1.0000841 -0.01619536 -0.34944005 0.06121434 1.1248453
## Stock2.2014-12-29 Stock2 2014-12-29 0.0003308951 0.7503819 -0.19672547 -0.10242616 -0.22649138 0.8329570
## Stock2.2014-12-30 Stock2 2014-12-30 0.0003051495 0.7409709 -0.18998561 -0.07461764 -0.22404485 0.7921883
## Stock2.2014-12-31 Stock2 2014-12-31 0.0002614874 0.7478099 -0.18330767 -0.06197362 -0.20566148 0.7550211
如果您不喜欢新的行名,可以将行换行在 `rownames<-`(...,NULL)
.
这是另一个解决方案。
library(xts)
library(magrittr)
library(dplyr)
stock1 =
data.frame(a = c(1, 2), b = c(1, 2) ) %>%
xts(order.by =
c("2014-12-29", "2014-12-30") %>%
as.Date)
stock2 =
data.frame(a = c(1, 2), b = c(1, 2) ) %>%
xts(order.by =
c("2014-12-29", "2014-12-30") %>%
as.Date)
test = list(stock1 = stock1, stock2 = stock2)
result =
1:length(test) %>%
lapply(function(i)
test[[i]] %>%
as.data.frame %>%
mutate(name = names(test)[i],
date = rownames(.) %>% as.Date ) ) %>%
bind_rows