逻辑条件下的滚动应用百分比（R 中的滚动率）

Question

我在 R 中有一个数据框，其中包含两列，逻辑条件如下所示：

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

导致：

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

我想滚动计算每列上 TRUE 的百分比，理想情况下必须如下所示：

check1	check2
1/1	1/1
2/2	2/2
3/3	3/3
3/4	4/4
3/5	5/5
4/6	5/6
5/7	5/7
6/8	6/8
6/9	7/9
6/10	8/10

也许...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

有什么帮助吗？

Answer 1

你快到了；只需使用 across 将您的函数应用于两列。

或者，您可以使用 dplyr::cummean 来计算运行比例。

关于术语的说明：rolling 通常是指在 fixed-size window 中计算统计量（例如平均值或最大值）。另一方面，cumulative 统计数据是在 ever-increasig window 中从索引 1（或第一行）开始计算的。请参阅 window 函数上的 vignette。使用正确的术语可能有助于您在文档中搜索适当的功能。

library("tidyverse")

check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), ~ cumsum(.) / row_number())
  )

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), cummean)
  )
cummeans
#> # A tibble: 10 × 2
#>    check1 check2
#>     <dbl>  <dbl>
#>  1  1      1    
#>  2  1      1    
#>  3  1      1    
#>  4  0.75   1    
#>  5  0.6    1    
#>  6  0.667  0.833
#>  7  0.714  0.714
#>  8  0.75   0.75 
#>  9  0.667  0.778
#> 10  0.6    0.8

# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
  # The example data has no index column; will use the row ids instead
  rowid_to_column() %>%
  pivot_longer(
    c(check1, check2),
    names_to = "check",
    values_to = "cummean"
  ) %>%
  ggplot(
    aes(rowid, cummean, color = check)
  ) +
  geom_line() +
  # Proportions have a natural range from 0 to 1
  scale_y_continuous(
    limits = c(0, 1)
  )

^{由 reprex package (v2.0.1)}

于 2022-03-14 创建

Answer 2

1) 这给出了分数形式的结果。

library(zoo)

rollapplyr(dat, 1:nrow(dat), mean)
##          check1    check2
##  [1,] 1.0000000 1.0000000
##  [2,] 1.0000000 1.0000000
##  [3,] 1.0000000 1.0000000
##  [4,] 0.7500000 1.0000000
##  [5,] 0.6000000 1.0000000
##  [6,] 0.6666667 0.8333333
##  [7,] 0.7142857 0.7142857
##  [8,] 0.7500000 0.7500000
##  [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000

1a) 要将百分比乘以 100：

100 * rollapplyr(dat, 1:nrow(dat), mean)

2) 或仅使用基数 R:

apply(dat, 2, cumsum) / row(dat)

2a) 或百分比

100 * apply(dat, 2, cumsum) / row(dat)

逻辑条件下的滚动应用百分比（R 中的滚动率）

Rollapply percentage from logical conditions (Rolling rate in R )

r

percentage

rollapply