计算时间序列数据R中的returns
Calculate returns in time series data R
R 无法将我的数据表识别为一个面板,我有几十年的收盘价和总价 return 价格,但有时中间的几个月会丢失,所以一个简单的 return 计算滞后值不起作用有两个原因:您不希望 returns 超过相隔 1 个月的滞后值,现在每家公司都需要 returns,而不是每个观察有一个时间序列.我的解决方案是:
df1 <- df %>%
group_by(seriesid) %>%
mutate(totret <- ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret")
df1 <- df1 %>%
group_by(seriesid) %>%
mutate(closeret <- ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret", "closeret")
这并不花哨,但 R 不允许更花哨的解决方案,因为它无法识别新列。
我的数据如下:
date company returnprice close seriesid
1 1888-01-31 x 2.500 2.500 0005
2 1888-02-04 x 2.750 2.750 0005
3 1888-04-20 x 3.350 3.350 0005
4 1895-01-30 y 7.500 4.350 0001
5 1895-02-26 y 7.800 4.650 0001
我现在可以获取我的数据:
date company totalreturn close seriesid totret closeret
1 1888-01-31 x 2.500 2.500 0005 NA NA
2 1888-02-04 x 2.750 2.750 0005 0.1 0.1
3 1888-04-20 x 3.350 3.350 0005 NA NA
4 1895-01-30 y 7.500 4.350 0001 NA NA
5 1895-02-26 y 7.800 4.650 0001 0.04 0.06897
df1 <- df %>%
group_by(seriesid) %>%
mutate(totret <- ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret")
df1 <- df1 %>%
group_by(seriesid) %>%
mutate(closeret <- ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret", "closeret")
按照你的例子,我添加了更多的日期只是为了看看当超过 3 行应该是 NA 时会发生什么,并且你的代码可以正常工作。但是,您会在新的一年开始时找到并发布,因为 "December" > "January".
data2 <- data %>% mutate(totret = ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1),
closeret = ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
date totalreturn close totret closeret
1 1888-01-28 2.5 2.5 NA NA
2 1888-02-28 2.7 2.7 0.0800000 0.08000000
3 1888-03-28 3.0 3.3 0.1111111 0.22222222
4 1888-05-28 3.5 3.5 NA NA
5 1888-08-28 2.8 4.0 NA NA
6 1888-10-28 3.0 4.3 NA NA
7 1888-12-28 3.2 4.5 NA NA
8 1889-03-28 3.6 4.6 0.1250000 0.02222222
我建议使用 difftime()
并在差异大于 31 天时估算 NA。
data3 <- data %>% mutate(totret = ifelse(difftime(date, lag(date), units = 'days')>31, NA, totalreturn/lag(totalreturn)-1),
closeret = ifelse(difftime(date, lag(date), units = 'days')>31, NA, close/lag(close)-1))
date totalreturn close totret closeret
1 1888-01-28 2.5 2.5 NA NA
2 1888-02-28 2.7 2.7 0.0800000 0.0800000
3 1888-03-28 3.0 3.3 0.1111111 0.2222222
4 1888-05-28 3.5 3.5 NA NA
5 1888-08-28 2.8 4.0 NA NA
6 1888-10-28 3.0 4.3 NA NA
7 1888-12-28 3.2 4.5 NA NA
8 1889-03-28 3.6 4.6 NA NA
我也试过 difftime(dates[2], dates[1], units = 'secs') > duration(1, units = 'month')
,但自 "month is 30.41667 days"
以来 31 天的差异将无法正常工作
R 无法将我的数据表识别为一个面板,我有几十年的收盘价和总价 return 价格,但有时中间的几个月会丢失,所以一个简单的 return 计算滞后值不起作用有两个原因:您不希望 returns 超过相隔 1 个月的滞后值,现在每家公司都需要 returns,而不是每个观察有一个时间序列.我的解决方案是:
df1 <- df %>%
group_by(seriesid) %>%
mutate(totret <- ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret")
df1 <- df1 %>%
group_by(seriesid) %>%
mutate(closeret <- ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret", "closeret")
这并不花哨,但 R 不允许更花哨的解决方案,因为它无法识别新列。 我的数据如下:
date company returnprice close seriesid
1 1888-01-31 x 2.500 2.500 0005
2 1888-02-04 x 2.750 2.750 0005
3 1888-04-20 x 3.350 3.350 0005
4 1895-01-30 y 7.500 4.350 0001
5 1895-02-26 y 7.800 4.650 0001
我现在可以获取我的数据:
date company totalreturn close seriesid totret closeret
1 1888-01-31 x 2.500 2.500 0005 NA NA
2 1888-02-04 x 2.750 2.750 0005 0.1 0.1
3 1888-04-20 x 3.350 3.350 0005 NA NA
4 1895-01-30 y 7.500 4.350 0001 NA NA
5 1895-02-26 y 7.800 4.650 0001 0.04 0.06897
df1 <- df %>%
group_by(seriesid) %>%
mutate(totret <- ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret")
df1 <- df1 %>%
group_by(seriesid) %>%
mutate(closeret <- ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
names(df1) <- c("date","company","totalreturn","close", "seriesid", "ticker","totret", "closeret")
按照你的例子,我添加了更多的日期只是为了看看当超过 3 行应该是 NA 时会发生什么,并且你的代码可以正常工作。但是,您会在新的一年开始时找到并发布,因为 "December" > "January".
data2 <- data %>% mutate(totret = ifelse(month(date)-month(lag(date))>1,NA,totalreturn/lag(totalreturn)-1),
closeret = ifelse(month(date)-month(lag(date))>1,NA,close/lag(close)-1))
date totalreturn close totret closeret
1 1888-01-28 2.5 2.5 NA NA
2 1888-02-28 2.7 2.7 0.0800000 0.08000000
3 1888-03-28 3.0 3.3 0.1111111 0.22222222
4 1888-05-28 3.5 3.5 NA NA
5 1888-08-28 2.8 4.0 NA NA
6 1888-10-28 3.0 4.3 NA NA
7 1888-12-28 3.2 4.5 NA NA
8 1889-03-28 3.6 4.6 0.1250000 0.02222222
我建议使用 difftime()
并在差异大于 31 天时估算 NA。
data3 <- data %>% mutate(totret = ifelse(difftime(date, lag(date), units = 'days')>31, NA, totalreturn/lag(totalreturn)-1),
closeret = ifelse(difftime(date, lag(date), units = 'days')>31, NA, close/lag(close)-1))
date totalreturn close totret closeret
1 1888-01-28 2.5 2.5 NA NA
2 1888-02-28 2.7 2.7 0.0800000 0.0800000
3 1888-03-28 3.0 3.3 0.1111111 0.2222222
4 1888-05-28 3.5 3.5 NA NA
5 1888-08-28 2.8 4.0 NA NA
6 1888-10-28 3.0 4.3 NA NA
7 1888-12-28 3.2 4.5 NA NA
8 1889-03-28 3.6 4.6 NA NA
我也试过 difftime(dates[2], dates[1], units = 'secs') > duration(1, units = 'month')
,但自 "month is 30.41667 days"