基于索引的非重叠滑动window

Non-overlapping sliding window based on index

对于一个data.frame,df,它有一个索引列和一个值列,我想计算e.g.the非重叠滑动windows中的值的平均值, window 大小基于索引列中的单位(例如,windows 涵盖索引中的 10 个单位)。

runner::runnerslider::slide_index 允许您根据索引列在 windows 中滑动,但我没有找到制作 [=20= 的方法] 不重叠。

df = structure(list(V3 = c(17054720L, 17075353L, 17087656L, 17099107L, 
17152611L, 17154984L, 17178213L, 17256231L, 17264565L, 17280822L, 
17281931L, 17285949L, 17289118L, 17294251L, 17301217L, 17301843L, 
17304246L, 17304887L, 17306104L, 17310741L, 17312596L, 17315102L, 
17315503L, 17317233L, 17318150L, 17319156L, 17326181L, 17326432L, 
17394989L, 17395610L, 17396612L, 17397875L, 17398508L, 17398800L, 
17398812L, 17399211L, 17405173L, 17407349L, 17407566L, 17409897L, 
17410373L, 17412216L, 17412806L, 17414103L, 17414640L, 17415572L, 
17426401L, 17427037L, 17429384L, 17429434L, 17433210L, 17434084L, 
17436846L, 17441524L, 17442154L, 17443131L, 17445502L, 17446157L, 
17446914L, 17450515L, 17452966L, 17462185L, 17467411L, 17467684L, 
17470779L, 17475921L, 17488195L, 17489577L, 17489890L, 17490932L, 
17492203L, 17492452L, 17493792L, 17494101L, 17494547L, 17524203L, 
17525584L, 17525970L, 17529814L, 17541673L, 17545859L, 17557144L, 
17567699L, 17575800L, 17580394L, 17580813L, 17585441L, 17586471L, 
17587680L, 17587975L, 17589209L, 17589246L, 17593685L, 17594915L, 
17597462L, 17599844L, 17603801L, 17605824L, 17611515L, 17615213L
), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L)), row.names = c(NA, -100L), class = "data.frame")

像这样的事情怎么样:

df <- data.frame(
  index = 0:99, 
  value = 1:100)

df %>% 
  mutate(window = floor(index/10)) %>% 
  group_by(window) %>% 
  summarise(value = mean(value), 
            n = n())
# # A tibble: 10 × 3
#   window value     n
#    <dbl> <dbl> <int>
# 1      0   5.5    10
# 2      1  15.5    10
# 3      2  25.5    10
# 4      3  35.5    10
# 5      4  45.5    10
# 6      5  55.5    10
# 7      6  65.5    10
# 8      7  75.5    10
# 9      8  85.5    10
# 10      9  95.5    10

在上面的答案中,您将索引除以 window 宽度并将其包装在 floor() 函数中,以便将所有观察结果舍入到最接近的整数。这假定索引值是连续的整数。另一种方法是,如果索引不是顺序的,则如下所示:

df <- data.frame(
  index = sample(0:1000, 100, replace=FALSE), 
  value = 1:100)

df %>% 
  arrange(index) %>% 
  mutate(obs = seq_along(index)-1, 
         window = floor(obs/10)) %>% 
  group_by(window) %>% 
  summarise(value = mean(value), 
            n = n())
# A tibble: 10 × 3
#   window value     n
#    <dbl> <dbl> <int>
# 1      0  38.2    10
# 2      1  50.1    10
# 3      2  63.6    10
# 4      3  64.9    10
# 5      4  44      10
# 6      5  41.5    10
# 7      6  65.4    10
# 8      7  45.1    10
# 9      8  48.9    10
# 10      9  43.3    10