Q:特殊条件下,R 的变化率; ForLoop/Apply/Lag
Q: Special conditions, rate of change in R ; ForLoop/Apply/Lag
我开始接触 R,而且我对时间序列概念是全新的。任何人都可以指出正确的方向来计算每月的百分比变化。
- 我有不同年份、不同月份、不同城镇和价格的数据以及变化率,就像这样
.
i | hrvyear | m | town | price | rate of change
1 | 1270 | 5 | Chesterford | 80 | NA
2 | 1270 | 6 | Chesterford | 64 | -20 %
3 | 1270 | 7 | Lopham | 74 | NA
4 | 1274 | 12 | Lopham | 74 | NA
5 | 1275 | 1 | Lopham | 78 | 5,4054 %
6 | 1275 | 2 | Lopham | 59 | -24,3589 %
7 | 1275 | 3 | Lopham | 61 | 3,3898 %
8 | 1275 | 5 | Lopham | 68 | NA
- 在第二步中,我想取上部 table 中从 9 月开始到 8 月的所有可能对的平均比率(-> 即 9_to_10, 9_to_11, ..., 9_to 8, 10_to_11, ..., 10_to_8, ... 7_8)
.
i | start_month | end_month | average_ratio | %change | Std. error | # cases
1 | 9 | 10 | 1,055 | 2,7 | 0.034 | 22
2 | 9 | 11 | ... | ... | ... | ..
3 | 9 | 12 | ... | ... | ... | ..
4 | 9 | 1 | ... | ... | ... | ..
5 | 9 | 2 | ... | ... | ... | ..
6 | 9 | 3 | ... | ... | ... | ..
7 | 9 | 4 | ... | ... | ... | ..
8 | 9 | 5 | ... | ... | ... | ..
9 | 9 | 6 | ... | ... | ... | ..
10 | 9 | 7 | ... | ... | ... | ..
11 | 9 | 8 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
.. | 12 | 1 | ... | ... | ... | ..
.. | 12 | 2 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
.. | 12 | 8 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
66 | 7 | 8 | ... | ... | ... | ..
计算:
变化率函数:((a-b)/b)*100,其中a表示新月,b表示上月
Average_ratio:所有年份和城镇中相应月份的平均值
%变化: (log(1+mean(average_ratio))/x)*100,
其中 x 是 start_month 和 end_month
的距离
structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford",
"Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80,
64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5,
12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L,
6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
我希望问题很清楚。我很感激任何建议。
到目前为止,我在第一步中使用了这段代码。但是,显然我不喜欢为每个 month-group
重复多次该功能
may_july <- complete_mc %>%
filter(
m %in% c(5,7)
) %>%
arrange(town, hrvyear, m)
# create new column, to check whether the previous month is from the same year and the same town (e.g. we start with may to july comparison)
roc <- c()
for (i in 1:nrow(may_july)) {
if(may_july$hrvyear[i+1] == may_july$hrvyear[i] & may_july$town[i+1] == may_july$town[i]) {
roc <- c(roc, TRUE)
} else {
roc <- c(roc, FALSE)
}
}
# add FALSE for the first row of the roc column, as no previous row exists,
# and in order to combine matrix with vector
roc <- c(FALSE, roc)
tm <- cbind(may_july, roc)
# if previous month is from the same year and the same town, calculate the ratio,
# if not, add NA
roc2 <- c()
for(i in 1:nrow(may_july)) {
if(roc[i]==TRUE) {
roc2 <- c(roc2, (may_july$mean_price[i+1] - may_july$mean_price[i]) / (may_july$mean_price[i]))
} else {
roc2 <- c(roc2, NA)
}
}
# combine matrix with the final ratios
tt <- cbind(may_july, roc2)
roc3 <- na.omit(roc2)
# calculate the rate of change with the average ratio
may_to_july <- (log(1+mean(roc3))/2)*100
mean(roc3)
´´´
你为此编写的函数几乎可以工作,但不要忘记将 am$`mean(price)`[i] - am$`mean(price)`[i-1])
放在括号中,这样你就不会在减法之前进行除法。
一个更简单的答案是在 data.tables 中使用 shift()
函数,它类似于 dplyr 中的 lead()
lag()
函数。它们 select 之前或之后的行取决于您传递的参数。
library(data.table)
dt <- as.data.table(structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford",
"Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80,
64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5,
12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L,
6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame")))
# this changes the name of your mean(price)
colnames(dt)[4] <- 'price'
dt[, rate := (price - shift(price))/price * 100]
dt
hrvyear m town price rate
1: 1270 5 Chesterford 80 NA
2: 1270 12 Chesterford 64 -25.000000
3: 1272 2 Lopham 74 13.513514
4: 1272 4 Lopham 78 5.128205
5: 1275 2 Lopham 59 -32.203390
6: 1275 3 Lopham 61 3.278689
我开始接触 R,而且我对时间序列概念是全新的。任何人都可以指出正确的方向来计算每月的百分比变化。
- 我有不同年份、不同月份、不同城镇和价格的数据以及变化率,就像这样
.
i | hrvyear | m | town | price | rate of change
1 | 1270 | 5 | Chesterford | 80 | NA
2 | 1270 | 6 | Chesterford | 64 | -20 %
3 | 1270 | 7 | Lopham | 74 | NA
4 | 1274 | 12 | Lopham | 74 | NA
5 | 1275 | 1 | Lopham | 78 | 5,4054 %
6 | 1275 | 2 | Lopham | 59 | -24,3589 %
7 | 1275 | 3 | Lopham | 61 | 3,3898 %
8 | 1275 | 5 | Lopham | 68 | NA
- 在第二步中,我想取上部 table 中从 9 月开始到 8 月的所有可能对的平均比率(-> 即 9_to_10, 9_to_11, ..., 9_to 8, 10_to_11, ..., 10_to_8, ... 7_8)
.
i | start_month | end_month | average_ratio | %change | Std. error | # cases
1 | 9 | 10 | 1,055 | 2,7 | 0.034 | 22
2 | 9 | 11 | ... | ... | ... | ..
3 | 9 | 12 | ... | ... | ... | ..
4 | 9 | 1 | ... | ... | ... | ..
5 | 9 | 2 | ... | ... | ... | ..
6 | 9 | 3 | ... | ... | ... | ..
7 | 9 | 4 | ... | ... | ... | ..
8 | 9 | 5 | ... | ... | ... | ..
9 | 9 | 6 | ... | ... | ... | ..
10 | 9 | 7 | ... | ... | ... | ..
11 | 9 | 8 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
.. | 12 | 1 | ... | ... | ... | ..
.. | 12 | 2 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
.. | 12 | 8 | ... | ... | ... | ..
.. | ... | .. | ... | ... | ... | ..
66 | 7 | 8 | ... | ... | ... | ..
计算:
变化率函数:((a-b)/b)*100,其中a表示新月,b表示上月
Average_ratio:所有年份和城镇中相应月份的平均值
%变化: (log(1+mean(average_ratio))/x)*100, 其中 x 是 start_month 和 end_month
的距离structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford",
"Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80,
64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5,
12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L,
6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
我希望问题很清楚。我很感激任何建议。
到目前为止,我在第一步中使用了这段代码。但是,显然我不喜欢为每个 month-group
重复多次该功能may_july <- complete_mc %>%
filter(
m %in% c(5,7)
) %>%
arrange(town, hrvyear, m)
# create new column, to check whether the previous month is from the same year and the same town (e.g. we start with may to july comparison)
roc <- c()
for (i in 1:nrow(may_july)) {
if(may_july$hrvyear[i+1] == may_july$hrvyear[i] & may_july$town[i+1] == may_july$town[i]) {
roc <- c(roc, TRUE)
} else {
roc <- c(roc, FALSE)
}
}
# add FALSE for the first row of the roc column, as no previous row exists,
# and in order to combine matrix with vector
roc <- c(FALSE, roc)
tm <- cbind(may_july, roc)
# if previous month is from the same year and the same town, calculate the ratio,
# if not, add NA
roc2 <- c()
for(i in 1:nrow(may_july)) {
if(roc[i]==TRUE) {
roc2 <- c(roc2, (may_july$mean_price[i+1] - may_july$mean_price[i]) / (may_july$mean_price[i]))
} else {
roc2 <- c(roc2, NA)
}
}
# combine matrix with the final ratios
tt <- cbind(may_july, roc2)
roc3 <- na.omit(roc2)
# calculate the rate of change with the average ratio
may_to_july <- (log(1+mean(roc3))/2)*100
mean(roc3)
´´´
你为此编写的函数几乎可以工作,但不要忘记将 am$`mean(price)`[i] - am$`mean(price)`[i-1])
放在括号中,这样你就不会在减法之前进行除法。
一个更简单的答案是在 data.tables 中使用 shift()
函数,它类似于 dplyr 中的 lead()
lag()
函数。它们 select 之前或之后的行取决于您传递的参数。
library(data.table)
dt <- as.data.table(structure(list(hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275
), m = c(5, 12, 2, 4, 2, 3), town = c("Chesterford", "Chesterford",
"Lopham", "Lopham", "Lopham", "Lopham"), `mean(price)` = c(80,
64, 74, 78, 59, 61)), row.names = c(NA, -6L), groups = structure(list(
hrvyear = c(1270, 1270, 1272, 1272, 1275, 1275), m = c(5,
12, 2, 4, 2, 3), .rows = structure(list(1L, 2L, 3L, 4L, 5L,
6L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame")))
# this changes the name of your mean(price)
colnames(dt)[4] <- 'price'
dt[, rate := (price - shift(price))/price * 100]
dt
hrvyear m town price rate
1: 1270 5 Chesterford 80 NA
2: 1270 12 Chesterford 64 -25.000000
3: 1272 2 Lopham 74 13.513514
4: 1272 4 Lopham 78 5.128205
5: 1275 2 Lopham 59 -32.203390
6: 1275 3 Lopham 61 3.278689