用 rle 计算递增的连续整数 R 的平均数

Question

我的数据框中有两列，左列表示 id，右列包含递增的整数，有些是连续的，有些不是。没有重复的整数。我的objective是获取每个id连续整数的平均数例如：

这是我的数据集的一个片段

station summary id

> data
      id moment
4448   1  11725
4540   1  11726
5457   1  11739
5519   1  11740
11733  1  11861
11797  1  11862
12020  1  11865
12313  1  11869
14576  1  11914
23314  1  12088
166    2  11644
278    2  11646
339    2  11647
407    2  11648
476    2  11649
545    2  11650
673    2  11652
737    2  11653
982    2  11657
1035   2  11658

在上面的示例中 id 1 在 moment 中具有以下数量的连续整数 - 2, 2, 2, 1, 1, 1, 1 - 所以平均值为 1.428

id 2 在 moment 中有以下数量的连续整数 - 1, 5, 2, 2 - 所以平均值是 2.5

真实数据集有大约 200 行和 300 个唯一 ID - 我想要每个 ID 的平均值。

我知道你必须以某种方式使用 rle() 函数，我可以使用以下代码找到最大数量：

aggregate( data$moment, dat['id'], FUN= function(d) max( rle( diff(d) )$lengths ) )

如何取平均值？

> dput(data)
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), moment = c(11725L, 11726L, 
11739L, 11740L, 11861L, 11862L, 11865L, 11869L, 11914L, 12088L, 
11644L, 11646L, 11647L, 11648L, 11649L, 11650L, 11652L, 11653L, 
11657L, 11658L)), .Names = c("id", "moment"), row.names = c(4448L, 
4540L, 5457L, 5519L, 11733L, 11797L, 12020L, 12313L, 14576L, 
23314L, 166L, 278L, 339L, 407L, 476L, 545L, 673L, 737L, 982L, 
1035L), class = "data.frame")

Answer 1

可能有更好的方法，但是...

aggregate(data$moment,list(data$id), function(x) mean(rle(diffinv(diff(x)!=1))$lengths))
#   Group.1        x
# 1       1 1.428571
# 2       2 2.500000

说明

我们先取差值。然后我们寻找那些不连续的数字 (diff(x)!=1)。然后我们取差值的倒数 (diffinv) 返回到原始长度。我们现在有一个在非连续数字时递增的向量。取 rle 个，然后是长度，最后应用 mean，就完成了。

Edit1：删除了一个不必要的步骤。

用 rle 计算递增的连续整数 R 的平均数

count average number of increasing consecutive integers R with rle

r

run-length-encoding