当 R 中的同一日期有多于一行时，使用日期的滞后值计算 returns

Question

我有一个包含 6k 资产及其市场价格数据的数据集。

我想计算每日 returns，因此要应用公式 martketprice[i] - marketprice[i-1]/marketprice[i-1] 问题是我对同一日期时间有多个观察，例如 asset x，我对 day t 有 3 个观察，因为它是由 investor 1, 2 and 3 交易的。对于数据集中的每个资产，依此类推。所以我的数据集看起来像：

investor    asset    datetime      marketprice
1            x          t              10
2            x          t              10
3            x          t              10

我的想法是使用类似

的东西

res <- res %>% 
  arrange(datetime) %>% 
  group_by(asset) %>% 
  mutate(ret = (marketprice - dplyr::lag(marketprice))/dplyr::lag(marketprice, default = NA)) %>% 
  ungroup()

但它不起作用，因为在上面的示例中，第 2 行意味着使用市场价格 [i-1]，这是同一天的市场价格，而我希望前一天 [t-1]使用（未包含在示例数据集中）

此外，R 应该检查 [i-1] 市场价格不属于距离超过 4 天的日期，因此如果第 i 行的日期是 10th of july，则应该应用计算仅当日期 [i-1] 为 6th of july 或更近时。

有什么想法吗？

Answer 1

基于我理解的以下假设：

当同一资产重复一天时，市场价格是相同的，与投资者无关。
您不介意是哪个投资者（因此我们可以删除行）
当第 (t) 天比前一天 (t-1) 早 5 天或提前时，可以输出 NaN。

图书馆和一些数据示例：

library(lubridate)
library(tidyverse)

# Data example

set.seed(132) # reproducibility

example = data.frame(
  investor = c(rep(1,3),2,3,rep(2,2),1,
               rep(2,4),rep(3,4)),
  asset = c(rep('A',8),
            rep('B',8)),
  datetime = c(today()+c(1,2,3,3,3,4,5,6),
               today()+c(1,seq(6,9),seq(16,18))),
  marketprice = c(10,20,30,30,30,sample(c(10,20,30),11,replace = TRUE))
)

示例数据集有 2 个资产。第一个 (A) 显示代码如何处理同一天的多行。第二个 (B) 显示了当日期跳跃超过 4 天时代码如何处理。

> example
   investor asset   datetime marketprice
1         1     A 2022-05-26          10
2         1     A 2022-05-27          20
3         1     A 2022-05-28          30
4         2     A 2022-05-28          30
5         3     A 2022-05-28          30
6         2     A 2022-05-29          30
7         2     A 2022-05-30          30
8         1     A 2022-05-31          30
9         2     B 2022-05-26          20
10        2     B 2022-05-31          10
11        2     B 2022-06-01          20
12        2     B 2022-06-02          10
13        3     B 2022-06-03          10
14        3     B 2022-06-10          30
15        3     B 2022-06-11          20
16        3     B 2022-06-12          10

Dplyr 代码：

# The formula is [price(t)-price(t-1)]/price(t-1) -> dif(price)/lag(price)
ret = example %>% 
  group_by(asset,datetime) %>% 
  slice(1) %>%  # remove repeated dates
  group_by(asset) %>% 
  arrange(datetime) %>% 
  mutate(ret = ifelse(datetime-lag(datetime) > 4,
                NA,
                (marketprice-lag(marketprice))/lag(marketprice))
         ) %>% # ifelse check the differences of days
  arrange(asset,datetime) # show by assets and dates

输出：

# A tibble: 14 x 5
# Groups:   asset [2]
   investor asset datetime   marketprice    ret
      <dbl> <chr> <date>           <dbl>  <dbl>
 1        1 A     2022-05-26          10 NA    
 2        1 A     2022-05-27          20  1    
 3        1 A     2022-05-28          30  0.5  
 4        2 A     2022-05-29          30  0    
 5        2 A     2022-05-30          30  0    
 6        1 A     2022-05-31          30  0    
 7        2 B     2022-05-26          20 NA    
 8        2 B     2022-05-31          10 NA    
 9        2 B     2022-06-01          20  1    
10        2 B     2022-06-02          10 -0.5  
11        3 B     2022-06-03          10  0    
12        3 B     2022-06-10          30 NA    
13        3 B     2022-06-11          20 -0.333
14        3 B     2022-06-12          10 -0.5

2 行丢失，因为一天有 3 个数据条目。

当 R 中的同一日期有多于一行时，使用日期的滞后值计算 returns

Compute returns using lagged values on date when there are more than one row for the same date in R

datetime

r

lag