Q:特殊条件下,R 的变化率; ForLoop/Apply/Lag

Q: Special conditions, rate of change in R ; ForLoop/Apply/Lag

我开始接触 R,而且我对时间序列概念是全新的。任何人都可以指出正确的方向来计算每月的百分比变化。

  1. 我有不同年份、不同月份、不同城镇和价格的数据以及变化率,就像这样

.

i  | hrvyear |  m   | town        |   price   |  rate of change
1  |  1270   |  5   | Chesterford |   80      |  NA
2  |  1270   |  6   | Chesterford |   64      |  -20 %
3  |  1270   |  7   | Lopham      |   74      |  NA
4  |  1274   |  12  | Lopham      |   74      |  NA
5  |  1275   |  1   | Lopham      |   78      |  5,4054 % 
6  |  1275   |  2   | Lopham      |   59      |  -24,3589 %
7  |  1275   |  3   | Lopham      |   61      |  3,3898 %
8  |  1275   |  5   | Lopham      |   68      |  NA
  1. 在第二步中,我想取上部 table 中从 9 月开始到 8 月的所有可能对的平均比率(-> 即 9_to_10, 9_to_11, ..., 9_to 8, 10_to_11, ..., 10_to_8, ... 7_8)

.

i  | start_month | end_month | average_ratio | %change | Std. error | # cases
1  |  9          | 10        |  1,055        | 2,7     |   0.034    | 22
2  |  9          | 11        |   ...         | ...     |   ...      | ..
3  |  9          | 12        |   ...         | ...     |   ...      | ..
4  |  9          | 1         |   ...         | ...     |   ...      | ..
5  |  9          | 2         |   ...         | ...     |   ...      | ..
6  |  9          | 3         |   ...         | ...     |   ...      | ..
7  |  9          | 4         |   ...         | ...     |   ...      | ..
8  |  9          | 5         |   ...         | ...     |   ...      | ..
9  |  9          | 6         |   ...         | ...     |   ...      | ..
10 |  9          | 7         |   ...         | ...     |   ...      | ..
11 |  9          | 8         |   ...         | ...     |   ...      | ..
.. |  ...        | ..        |   ...         | ...     |   ...      | ..
.. |  12         | 1         |   ...         | ...     |   ...      | ..
.. |  12         | 2         |   ...         | ...     |   ...      | ..
.. |  ...        | ..        |   ...         | ...     |   ...      | ..
.. |  12         | 8         |   ...         | ...     |   ...      | ..
.. |  ...        | ..        |   ...         | ...     |   ...      | ..
66 |  7          | 8         |   ...         | ...     |   ...      | ..

计算:

变化率函数:((a-b)/b)*100,其中a表示新月,b表示上月

Average_ratio:所有年份和城镇中相应月份的平均值

%变化: (log(1+mean(average_ratio))/x)*100, 其中 x 是 start_month 和 end_month

的距离
structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford", 
"Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80, 
64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
    hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5, 
    12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 
        6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
    "list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

我希望问题很清楚。我很感激任何建议。

到目前为止,我在第一步中使用了这段代码。但是,显然我不喜欢为每个 month-group

重复多次该功能
may_july <- complete_mc %>%
  filter(
    m %in% c(5,7)
             ) %>%
  arrange(town, hrvyear, m)

# create new column, to check whether the previous month is from the same year and the same town (e.g. we start with may to july comparison)
roc <- c()
for (i in 1:nrow(may_july)) {
  if(may_july$hrvyear[i+1] == may_july$hrvyear[i] & may_july$town[i+1] == may_july$town[i]) {
    roc <- c(roc, TRUE)
  } else {
    roc <- c(roc, FALSE)
  }
}

# add FALSE for the first row of the roc column, as no previous row exists, 
# and in order to combine matrix with vector
roc <- c(FALSE, roc)
tm <- cbind(may_july, roc)

# if previous month is from the same year and the same town, calculate the ratio,
# if not, add NA
roc2 <- c()
for(i in 1:nrow(may_july)) {
  if(roc[i]==TRUE) {
    roc2 <- c(roc2, (may_july$mean_price[i+1] - may_july$mean_price[i]) / (may_july$mean_price[i]))
  } else {
    roc2 <- c(roc2, NA)
  }
}

# combine matrix with the final ratios
tt <- cbind(may_july, roc2)
roc3 <- na.omit(roc2)

# calculate the rate of change with the average ratio
may_to_july <- (log(1+mean(roc3))/2)*100
mean(roc3)
´´´

你为此编写的函数几乎可以工作,但不要忘记将 am$`mean(price)`[i] - am$`mean(price)`[i-1]) 放在括号中,这样你就不会在减法之前进行除法。

一个更简单的答案是在 data.tables 中使用 shift() 函数,它类似于 dplyr 中的 lead() lag() 函数。它们 select 之前或之后的行取决于您传递的参数。

library(data.table)
dt <- as.data.table(structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford", 
                                      "Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80, 
                                                                                                 64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
                                                                                                   hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5, 
                                                                                                                                                          12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 
                                                                                                                                                                                                  6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
                                                                                                                                                                                                                                     "list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                                                                                 "data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                                                                                                                         "tbl", "data.frame")))
 
# this changes the name of your mean(price) 
colnames(dt)[4] <- 'price'

dt[, rate := (price - shift(price))/price * 100]

dt
   hrvyear  m        town price       rate
1:    1270  5 Chesterford    80         NA
2:    1270 12 Chesterford    64 -25.000000
3:    1272  2      Lopham    74  13.513514
4:    1272  4      Lopham    78   5.128205
5:    1275  2      Lopham    59 -32.203390
6:    1275  3      Lopham    61   3.278689