在 R 中按行获取重复连续值的摘要

Question

我试图在 R 中按行获取一些重复值的统计信息（最小值、最大值、平均值）。

我的数据框看起来与此类似：

b <- as.data.frame(matrix(ncol=7, nrow=3, 
     c(3,NA,NA,4,5,NA,7,6,NA,7,NA,8,9,NA,NA,4,6,NA,NA,7,NA), byrow = TRUE))

对于每一行，我想添加一列，其中包含编号的最小值、最大值和平均值。包含连续 NA 的列，它应该是这样的

  V1 V2 V3 V4 V5 V6 V7 max min mean
1  3 NA NA  4  5 NA  7   2   1  1.5
2  6 NA  7 NA  8  9 NA   1   1  1.0
3 NA  4  6 NA NA  7 NA   2   1  1.33

这只是我的数据集的一个小例子，有 2000 行和 48 列。

有人有这方面的代码吗？

Answer 1

您可以 apply 遍历行并获得 "runs" 非 NA 列。一旦你有了它，你就可以简单地获取这些的汇总统计数据：

b[,c("mean", "max", "min")] <- do.call(rbind, apply(b, 1, function(x){
                                                      res <- rle(!is.na(x))
                                                      res2 <- res[["lengths"]][res[["values"]]]
                                                      data.frame(mean = mean(res2), max = max(res2), min = min(res2))
                                                    }
                                                      ))

 b
#  V1 V2 V3 V4 V5 V6 V7     mean max min
#1  3 NA NA  4  5 NA  7 1.333333   2   1
#2  6 NA  7 NA  8  9 NA 1.333333   2   1
#3 NA  4  6 NA NA  7 NA 1.500000   2   1

Answer 2

dplyr 解决方案 rle 计算向量中等值游程的长度。

library(dplyr)
b %>% cbind( b %>% rowwise() %>% do(rl = rle(is.na(.))$lengths[rle(is.na(.))$values == T])) 
   %>% rowwise() 
   %>%  mutate(mean = mean(rl),
               max = max(rl),
               min = min(rl)) 
   %>% select(-rl)


#      V1    V2    V3    V4    V5    V6    V7   max   min  mean
#   <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
# 1     3    NA    NA     4     5    NA     7     2     1  1.50
# 2     6    NA     7    NA     8     9    NA     1     1  1.00
# 3    NA     4     6    NA    NA     7    NA     2     1  1.33

在 R 中按行获取重复连续值的摘要

Get summaries of repeated consecutive values by row in R

r

summary

apply

na