R - 如何制作 n 个先前值的 mean/average,不包括当前观察值(滚动平均值)
R - How to make a mean/average of n previous values, excluding current observation (rolling average)
有人可以建议如何最好地在数据框中创建新列,其中每个观察值都是前 12 个观察值(不包括当前观察值)的 average/mean。到目前为止,我未能在这里找到类似的答案,因此将不胜感激!
我的data.frame:
LateCounts <-
Date Count
1 Jan-19 7
2 Feb-19 4
3 Mar-19 9
4 Apr-19 8
5 May-19 7
6 Jun-19 4
7 Jul-19 4
8 Aug-19 5
9 Sep-19 2
10 Oct-19 5
11 Nov-19 7
12 Dec-19 4
13 Jan-20 3
14 Feb-20 4
15 Mar-20 5
16 Apr-20 2
17 May-20 3
18 Jun-20 2
19 Jul-20 3
20 Aug-20 4
21 Sep-20 3
22 Oct-20 2
我目前正在使用以下代码:
LateCounts <- LateCounts %>% mutate(RollAvge=rollapplyr(Count, 12, mean, partial = TRUE))
这会产生以下 12 个月滚动平均值:
Date Count RollAvge
1 Jan-19 7 7
2 Feb-19 4 5.5
3 Mar-19 9 6.666667
4 Apr-19 8 7
5 May-19 7 7
6 Jun-19 4 6.5
7 Jul-19 4 6.142857
8 Aug-19 5 6
9 Sep-19 2 5.555556
10 Oct-19 5 5.5
11 Nov-19 7 5.636364
12 Dec-19 4 5.5
13 Jan-20 3 5.166667
14 Feb-20 4 5.166667
15 Mar-20 5 4.833333
16 Apr-20 2 4.333333
17 May-20 3 4
18 Jun-20 2 3.833333
19 Jul-20 3 3.75
20 Aug-20 4 3.666667
21 Sep-20 3 3.75
22 Oct-20 2 3.5
我实际需要实现的是以下内容。这是 12 个月的追踪或滚动平均值(其中 'RollAvge' 列中的值是 'Count' 列中先前值的 averages/means - 不包括当前月份。
Date Count RollAvge
1 Jan-19 7
2 Feb-19 4 7
3 Mar-19 9 5.5
4 Apr-19 8 6.666667
5 May-19 7 7
6 Jun-19 4 7
7 Jul-19 4 6.5
8 Aug-19 5 6.142857
9 Sep-19 2 6
10 Oct-19 5 5.555556
11 Nov-19 7 5.5
12 Dec-19 4 5.636364
13 Jan-20 3 5.5
14 Feb-20 4 5.166667
15 Mar-20 5 5.166667
16 Apr-20 2 4.833333
17 May-20 3 4.333333
18 Jun-20 2 4
19 Jul-20 3 3.833333
20 Aug-20 4 3.75
21 Sep-20 3 3.666667
22 Oct-20 2 3.755556
谢谢,
我们需要对 rollapply
的输出进行 lag
。
library(dplyr)
library(zoo)
LateCounts %>%
mutate(RollAvge= lag(rollapplyr(Count, 12, mean, partial = TRUE)))
-输出
# Date Count RollAvge
#1 Jan-19 7 NA
#2 Feb-19 4 7.000000
#3 Mar-19 9 5.500000
#4 Apr-19 8 6.666667
#5 May-19 7 7.000000
#6 Jun-19 4 7.000000
#7 Jul-19 4 6.500000
#8 Aug-19 5 6.142857
#9 Sep-19 2 6.000000
#10 Oct-19 5 5.555556
#11 Nov-19 7 5.500000
#12 Dec-19 4 5.636364
#13 Jan-20 3 5.500000
#14 Feb-20 4 5.166667
#15 Mar-20 5 5.166667
#16 Apr-20 2 4.833333
#17 May-20 3 4.333333
#18 Jun-20 2 4.000000
#19 Jul-20 3 3.833333
#20 Aug-20 4 3.750000
#21 Sep-20 3 3.666667
#22 Oct-20 2 3.750000
数据
LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19",
"May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19",
"Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20",
"Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L,
8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L,
4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22"))
使用 dplyr 和 zoo 有一种方法可以使用 data.frame 函数@NW320d 使用相同的滚动平均函数但没有 mutate 和 pipes
图书馆(dplyr)
图书馆(动物园)
使用@akrun 的 LateCounts 代码(感谢您提供的代码片段!)
> LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19",
+ "May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19",
+ "Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20",
+ "Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L,
+ 8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L,
+ 4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3",
+ "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
+ "16", "17", "18", "19", "20", "21", "22"))
> data.frame(LateCounts$Count, rollavg=dplyr::lag(rollapplyr(LateCounts$Count, 12, mean, partial = TRUE)))
Output:
LateCounts.Count rollavg
1 7 NA
2 4 7.000000
3 9 5.500000
4 8 6.666667
5 7 7.000000
6 4 7.000000
7 4 6.500000
8 5 6.142857
9 2 6.000000
10 5 5.555556
11 7 5.500000
12 4 5.636364
13 3 5.500000
14 4 5.166667
15 5 5.166667
16 2 4.833333
17 3 4.333333
18 2 4.000000
19 3 3.833333
20 4 3.750000
21 3 3.666667
22 2 3.750000
有人可以建议如何最好地在数据框中创建新列,其中每个观察值都是前 12 个观察值(不包括当前观察值)的 average/mean。到目前为止,我未能在这里找到类似的答案,因此将不胜感激!
我的data.frame:
LateCounts <-
Date Count
1 Jan-19 7
2 Feb-19 4
3 Mar-19 9
4 Apr-19 8
5 May-19 7
6 Jun-19 4
7 Jul-19 4
8 Aug-19 5
9 Sep-19 2
10 Oct-19 5
11 Nov-19 7
12 Dec-19 4
13 Jan-20 3
14 Feb-20 4
15 Mar-20 5
16 Apr-20 2
17 May-20 3
18 Jun-20 2
19 Jul-20 3
20 Aug-20 4
21 Sep-20 3
22 Oct-20 2
我目前正在使用以下代码:
LateCounts <- LateCounts %>% mutate(RollAvge=rollapplyr(Count, 12, mean, partial = TRUE))
这会产生以下 12 个月滚动平均值:
Date Count RollAvge
1 Jan-19 7 7
2 Feb-19 4 5.5
3 Mar-19 9 6.666667
4 Apr-19 8 7
5 May-19 7 7
6 Jun-19 4 6.5
7 Jul-19 4 6.142857
8 Aug-19 5 6
9 Sep-19 2 5.555556
10 Oct-19 5 5.5
11 Nov-19 7 5.636364
12 Dec-19 4 5.5
13 Jan-20 3 5.166667
14 Feb-20 4 5.166667
15 Mar-20 5 4.833333
16 Apr-20 2 4.333333
17 May-20 3 4
18 Jun-20 2 3.833333
19 Jul-20 3 3.75
20 Aug-20 4 3.666667
21 Sep-20 3 3.75
22 Oct-20 2 3.5
我实际需要实现的是以下内容。这是 12 个月的追踪或滚动平均值(其中 'RollAvge' 列中的值是 'Count' 列中先前值的 averages/means - 不包括当前月份。
Date Count RollAvge
1 Jan-19 7
2 Feb-19 4 7
3 Mar-19 9 5.5
4 Apr-19 8 6.666667
5 May-19 7 7
6 Jun-19 4 7
7 Jul-19 4 6.5
8 Aug-19 5 6.142857
9 Sep-19 2 6
10 Oct-19 5 5.555556
11 Nov-19 7 5.5
12 Dec-19 4 5.636364
13 Jan-20 3 5.5
14 Feb-20 4 5.166667
15 Mar-20 5 5.166667
16 Apr-20 2 4.833333
17 May-20 3 4.333333
18 Jun-20 2 4
19 Jul-20 3 3.833333
20 Aug-20 4 3.75
21 Sep-20 3 3.666667
22 Oct-20 2 3.755556
谢谢,
我们需要对 rollapply
的输出进行 lag
。
library(dplyr)
library(zoo)
LateCounts %>%
mutate(RollAvge= lag(rollapplyr(Count, 12, mean, partial = TRUE)))
-输出
# Date Count RollAvge
#1 Jan-19 7 NA
#2 Feb-19 4 7.000000
#3 Mar-19 9 5.500000
#4 Apr-19 8 6.666667
#5 May-19 7 7.000000
#6 Jun-19 4 7.000000
#7 Jul-19 4 6.500000
#8 Aug-19 5 6.142857
#9 Sep-19 2 6.000000
#10 Oct-19 5 5.555556
#11 Nov-19 7 5.500000
#12 Dec-19 4 5.636364
#13 Jan-20 3 5.500000
#14 Feb-20 4 5.166667
#15 Mar-20 5 5.166667
#16 Apr-20 2 4.833333
#17 May-20 3 4.333333
#18 Jun-20 2 4.000000
#19 Jul-20 3 3.833333
#20 Aug-20 4 3.750000
#21 Sep-20 3 3.666667
#22 Oct-20 2 3.750000
数据
LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19",
"May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19",
"Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20",
"Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L,
8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L,
4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22"))
使用 dplyr 和 zoo 有一种方法可以使用 data.frame 函数@NW320d 使用相同的滚动平均函数但没有 mutate 和 pipes
图书馆(dplyr)
图书馆(动物园)
使用@akrun 的 LateCounts 代码(感谢您提供的代码片段!)
> LateCounts <- structure(list(Date = c("Jan-19", "Feb-19", "Mar-19", "Apr-19",
+ "May-19", "Jun-19", "Jul-19", "Aug-19", "Sep-19", "Oct-19", "Nov-19",
+ "Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20", "Jun-20",
+ "Jul-20", "Aug-20", "Sep-20", "Oct-20"), Count = c(7L, 4L, 9L,
+ 8L, 7L, 4L, 4L, 5L, 2L, 5L, 7L, 4L, 3L, 4L, 5L, 2L, 3L, 2L, 3L,
+ 4L, 3L, 2L)), class = "data.frame", row.names = c("1", "2", "3",
+ "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
+ "16", "17", "18", "19", "20", "21", "22"))
> data.frame(LateCounts$Count, rollavg=dplyr::lag(rollapplyr(LateCounts$Count, 12, mean, partial = TRUE)))
Output:
LateCounts.Count rollavg
1 7 NA
2 4 7.000000
3 9 5.500000
4 8 6.666667
5 7 7.000000
6 4 7.000000
7 4 6.500000
8 5 6.142857
9 2 6.000000
10 5 5.555556
11 7 5.500000
12 4 5.636364
13 3 5.500000
14 4 5.166667
15 5 5.166667
16 2 4.833333
17 3 4.333333
18 2 4.000000
19 3 3.833333
20 4 3.750000
21 3 3.666667
22 2 3.750000