基于索引的非重叠滑动window
Non-overlapping sliding window based on index
对于一个data.frame,df,它有一个索引列和一个值列,我想计算e.g.the非重叠滑动windows中的值的平均值, window 大小基于索引列中的单位(例如,windows 涵盖索引中的 10 个单位)。
有 runner::runner
和 slider::slide_index
允许您根据索引列在 windows 中滑动,但我没有找到制作 [=20= 的方法] 不重叠。
df = structure(list(V3 = c(17054720L, 17075353L, 17087656L, 17099107L,
17152611L, 17154984L, 17178213L, 17256231L, 17264565L, 17280822L,
17281931L, 17285949L, 17289118L, 17294251L, 17301217L, 17301843L,
17304246L, 17304887L, 17306104L, 17310741L, 17312596L, 17315102L,
17315503L, 17317233L, 17318150L, 17319156L, 17326181L, 17326432L,
17394989L, 17395610L, 17396612L, 17397875L, 17398508L, 17398800L,
17398812L, 17399211L, 17405173L, 17407349L, 17407566L, 17409897L,
17410373L, 17412216L, 17412806L, 17414103L, 17414640L, 17415572L,
17426401L, 17427037L, 17429384L, 17429434L, 17433210L, 17434084L,
17436846L, 17441524L, 17442154L, 17443131L, 17445502L, 17446157L,
17446914L, 17450515L, 17452966L, 17462185L, 17467411L, 17467684L,
17470779L, 17475921L, 17488195L, 17489577L, 17489890L, 17490932L,
17492203L, 17492452L, 17493792L, 17494101L, 17494547L, 17524203L,
17525584L, 17525970L, 17529814L, 17541673L, 17545859L, 17557144L,
17567699L, 17575800L, 17580394L, 17580813L, 17585441L, 17586471L,
17587680L, 17587975L, 17589209L, 17589246L, 17593685L, 17594915L,
17597462L, 17599844L, 17603801L, 17605824L, 17611515L, 17615213L
), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L)), row.names = c(NA, -100L), class = "data.frame")
像这样的事情怎么样:
df <- data.frame(
index = 0:99,
value = 1:100)
df %>%
mutate(window = floor(index/10)) %>%
group_by(window) %>%
summarise(value = mean(value),
n = n())
# # A tibble: 10 × 3
# window value n
# <dbl> <dbl> <int>
# 1 0 5.5 10
# 2 1 15.5 10
# 3 2 25.5 10
# 4 3 35.5 10
# 5 4 45.5 10
# 6 5 55.5 10
# 7 6 65.5 10
# 8 7 75.5 10
# 9 8 85.5 10
# 10 9 95.5 10
在上面的答案中,您将索引除以 window 宽度并将其包装在 floor()
函数中,以便将所有观察结果舍入到最接近的整数。这假定索引值是连续的整数。另一种方法是,如果索引不是顺序的,则如下所示:
df <- data.frame(
index = sample(0:1000, 100, replace=FALSE),
value = 1:100)
df %>%
arrange(index) %>%
mutate(obs = seq_along(index)-1,
window = floor(obs/10)) %>%
group_by(window) %>%
summarise(value = mean(value),
n = n())
# A tibble: 10 × 3
# window value n
# <dbl> <dbl> <int>
# 1 0 38.2 10
# 2 1 50.1 10
# 3 2 63.6 10
# 4 3 64.9 10
# 5 4 44 10
# 6 5 41.5 10
# 7 6 65.4 10
# 8 7 45.1 10
# 9 8 48.9 10
# 10 9 43.3 10
对于一个data.frame,df,它有一个索引列和一个值列,我想计算e.g.the非重叠滑动windows中的值的平均值, window 大小基于索引列中的单位(例如,windows 涵盖索引中的 10 个单位)。
有 runner::runner
和 slider::slide_index
允许您根据索引列在 windows 中滑动,但我没有找到制作 [=20= 的方法] 不重叠。
df = structure(list(V3 = c(17054720L, 17075353L, 17087656L, 17099107L,
17152611L, 17154984L, 17178213L, 17256231L, 17264565L, 17280822L,
17281931L, 17285949L, 17289118L, 17294251L, 17301217L, 17301843L,
17304246L, 17304887L, 17306104L, 17310741L, 17312596L, 17315102L,
17315503L, 17317233L, 17318150L, 17319156L, 17326181L, 17326432L,
17394989L, 17395610L, 17396612L, 17397875L, 17398508L, 17398800L,
17398812L, 17399211L, 17405173L, 17407349L, 17407566L, 17409897L,
17410373L, 17412216L, 17412806L, 17414103L, 17414640L, 17415572L,
17426401L, 17427037L, 17429384L, 17429434L, 17433210L, 17434084L,
17436846L, 17441524L, 17442154L, 17443131L, 17445502L, 17446157L,
17446914L, 17450515L, 17452966L, 17462185L, 17467411L, 17467684L,
17470779L, 17475921L, 17488195L, 17489577L, 17489890L, 17490932L,
17492203L, 17492452L, 17493792L, 17494101L, 17494547L, 17524203L,
17525584L, 17525970L, 17529814L, 17541673L, 17545859L, 17557144L,
17567699L, 17575800L, 17580394L, 17580813L, 17585441L, 17586471L,
17587680L, 17587975L, 17589209L, 17589246L, 17593685L, 17594915L,
17597462L, 17599844L, 17603801L, 17605824L, 17611515L, 17615213L
), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L)), row.names = c(NA, -100L), class = "data.frame")
像这样的事情怎么样:
df <- data.frame(
index = 0:99,
value = 1:100)
df %>%
mutate(window = floor(index/10)) %>%
group_by(window) %>%
summarise(value = mean(value),
n = n())
# # A tibble: 10 × 3
# window value n
# <dbl> <dbl> <int>
# 1 0 5.5 10
# 2 1 15.5 10
# 3 2 25.5 10
# 4 3 35.5 10
# 5 4 45.5 10
# 6 5 55.5 10
# 7 6 65.5 10
# 8 7 75.5 10
# 9 8 85.5 10
# 10 9 95.5 10
在上面的答案中,您将索引除以 window 宽度并将其包装在 floor()
函数中,以便将所有观察结果舍入到最接近的整数。这假定索引值是连续的整数。另一种方法是,如果索引不是顺序的,则如下所示:
df <- data.frame(
index = sample(0:1000, 100, replace=FALSE),
value = 1:100)
df %>%
arrange(index) %>%
mutate(obs = seq_along(index)-1,
window = floor(obs/10)) %>%
group_by(window) %>%
summarise(value = mean(value),
n = n())
# A tibble: 10 × 3
# window value n
# <dbl> <dbl> <int>
# 1 0 38.2 10
# 2 1 50.1 10
# 3 2 63.6 10
# 4 3 64.9 10
# 5 4 44 10
# 6 5 41.5 10
# 7 6 65.4 10
# 8 7 45.1 10
# 9 8 48.9 10
# 10 9 43.3 10