按组和整个样本进行总结的最简单方法是什么?

What's the easiest way to summarize by group and for the whole sample?

假设我有这样的数据:

   Date       time     price minute FOMC  Daily.Return
   <date>     <time>   <dbl>  <dbl> <fct>        <dbl>
 1 2005-01-03 16:00:00  120.    960 FALSE       -1.24 
 2 2005-01-04 16:00:00  119.    960 FALSE       -1.44 
 3 2005-01-05 16:00:00  118.    960 FALSE       -0.354
 4 2005-01-06 16:00:01  119.    960 FALSE        0.245
 5 2005-01-07 15:59:00  119.    959 FALSE       -0.328
 6 2005-01-10 16:00:00  119.    960 FALSE        0.506
 7 2005-01-11 16:00:00  118.    960 FALSE       -0.279
 8 2005-01-12 16:00:01  119.    960 FALSE        0.329
 9 2005-01-13 16:00:00  118.    960 FALSE       -0.787
10 2005-01-14 16:00:00  118.    960 FALSE        0.372

我想使用 FOMC 变量对每个组进行总结 Daily.Return,该变量要么是 TRUE,要么是 FALSE。使用 dplyr 很容易。我得到以下信息:

daily.SPY %>%  group_by(FOMC) %>% 
  summarise(Mean = 100 * mean(Daily.Return),
            Median = 100 * median(Daily.Return),
            Vol = 100 * sqrt(252) * sd(Daily.Return/100))

不出所料,我得到了以下提示:

FOMC      Mean Median   Vol
  <fct>    <dbl>  <dbl> <dbl>
1 FALSE  0.00551   5.24  14.9
2 TRUE  20.8       1.20  17.6

但是,我想要第三行,它可以在不分组的情况下执行相同的计算。它将计算整个样本的平均值、中值和标准差,而不以组为条件。在 tidyverse 内完成此操作的最简单方法是什么?谢谢!

你可以做一个汇总数据的函数:

summarize_returns = function(data) {
  data %>%
    summarise(
      Mean = 100 * mean(Daily.Return),
      Median = 100 * median(Daily.Return),
      Vol = 100 * sqrt(252) * sd(Daily.Return / 100),
      .groups = "drop"
    )
}

然后您可以使用 dplyr::bind_rows():

合并两个摘要
data %>%
  group_by(FOMC) %>%
  summarize_returns() %>%
  bind_rows(
    data %>% summarize_returns() %>% mutate(FOMC = "Total")
  )

# A tibble: 3 x 4
  FOMC     Mean Median   Vol
  <chr>   <dbl>  <dbl> <dbl>
1 FALSE -13.6   -13.3   15.5
2 TRUE   14.4     8.79  16.6
3 Total   0.992  -1.08  16.2

我的数据:

library(tidyverse)
set.seed(123)
data = tibble(
  FOMC = as.character(sample(c(TRUE, FALSE), 100, replace = TRUE),
  Daily.Return = rnorm(100)
)

一个选项是将 mutate() 变量 FOMC 变量绑定到 "ALL" 的整个数据的副本,这样当你最终将它作为一个单独的组时你 group_by()summarise().

library(tidyverse)

set.seed(1)

daily.SPY <- tibble(
  FOMC = factor(rep(c(T, F), each = 25)),
  Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
)

daily.SPY %>% 
  bind_rows(., mutate(., FOMC = "ALL")) %>%
  group_by(FOMC) %>% 
  summarise(Mean = 100 * mean(Daily.Return),
            Median = 100 * median(Daily.Return),
            Vol = 100 * sqrt(252) * sd(Daily.Return/100))
#> # A tibble: 3 x 4
#>   FOMC   Mean Median   Vol
#>   <chr> <dbl>  <dbl> <dbl>
#> 1 ALL    58.4  -6.57  32.3
#> 2 FALSE -80.3 -53.6   13.9
#> 3 TRUE  197.  151.    30.5

reprex package (v2.0.1)

创建于 2022-01-11