按组和整个样本进行总结的最简单方法是什么?
What's the easiest way to summarize by group and for the whole sample?
假设我有这样的数据:
Date time price minute FOMC Daily.Return
<date> <time> <dbl> <dbl> <fct> <dbl>
1 2005-01-03 16:00:00 120. 960 FALSE -1.24
2 2005-01-04 16:00:00 119. 960 FALSE -1.44
3 2005-01-05 16:00:00 118. 960 FALSE -0.354
4 2005-01-06 16:00:01 119. 960 FALSE 0.245
5 2005-01-07 15:59:00 119. 959 FALSE -0.328
6 2005-01-10 16:00:00 119. 960 FALSE 0.506
7 2005-01-11 16:00:00 118. 960 FALSE -0.279
8 2005-01-12 16:00:01 119. 960 FALSE 0.329
9 2005-01-13 16:00:00 118. 960 FALSE -0.787
10 2005-01-14 16:00:00 118. 960 FALSE 0.372
我想使用 FOMC
变量对每个组进行总结 Daily.Return
,该变量要么是 TRUE,要么是 FALSE。使用 dplyr 很容易。我得到以下信息:
daily.SPY %>% group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
不出所料,我得到了以下提示:
FOMC Mean Median Vol
<fct> <dbl> <dbl> <dbl>
1 FALSE 0.00551 5.24 14.9
2 TRUE 20.8 1.20 17.6
但是,我想要第三行,它可以在不分组的情况下执行相同的计算。它将计算整个样本的平均值、中值和标准差,而不以组为条件。在 tidyverse
内完成此操作的最简单方法是什么?谢谢!
你可以做一个汇总数据的函数:
summarize_returns = function(data) {
data %>%
summarise(
Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return / 100),
.groups = "drop"
)
}
然后您可以使用 dplyr::bind_rows()
:
合并两个摘要
data %>%
group_by(FOMC) %>%
summarize_returns() %>%
bind_rows(
data %>% summarize_returns() %>% mutate(FOMC = "Total")
)
# A tibble: 3 x 4
FOMC Mean Median Vol
<chr> <dbl> <dbl> <dbl>
1 FALSE -13.6 -13.3 15.5
2 TRUE 14.4 8.79 16.6
3 Total 0.992 -1.08 16.2
我的数据:
library(tidyverse)
set.seed(123)
data = tibble(
FOMC = as.character(sample(c(TRUE, FALSE), 100, replace = TRUE),
Daily.Return = rnorm(100)
)
一个选项是将 mutate()
变量 FOMC
变量绑定到 "ALL"
的整个数据的副本,这样当你最终将它作为一个单独的组时你 group_by()
和 summarise()
.
library(tidyverse)
set.seed(1)
daily.SPY <- tibble(
FOMC = factor(rep(c(T, F), each = 25)),
Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
)
daily.SPY %>%
bind_rows(., mutate(., FOMC = "ALL")) %>%
group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
#> # A tibble: 3 x 4
#> FOMC Mean Median Vol
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALL 58.4 -6.57 32.3
#> 2 FALSE -80.3 -53.6 13.9
#> 3 TRUE 197. 151. 30.5
由 reprex package (v2.0.1)
创建于 2022-01-11
假设我有这样的数据:
Date time price minute FOMC Daily.Return
<date> <time> <dbl> <dbl> <fct> <dbl>
1 2005-01-03 16:00:00 120. 960 FALSE -1.24
2 2005-01-04 16:00:00 119. 960 FALSE -1.44
3 2005-01-05 16:00:00 118. 960 FALSE -0.354
4 2005-01-06 16:00:01 119. 960 FALSE 0.245
5 2005-01-07 15:59:00 119. 959 FALSE -0.328
6 2005-01-10 16:00:00 119. 960 FALSE 0.506
7 2005-01-11 16:00:00 118. 960 FALSE -0.279
8 2005-01-12 16:00:01 119. 960 FALSE 0.329
9 2005-01-13 16:00:00 118. 960 FALSE -0.787
10 2005-01-14 16:00:00 118. 960 FALSE 0.372
我想使用 FOMC
变量对每个组进行总结 Daily.Return
,该变量要么是 TRUE,要么是 FALSE。使用 dplyr 很容易。我得到以下信息:
daily.SPY %>% group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
不出所料,我得到了以下提示:
FOMC Mean Median Vol
<fct> <dbl> <dbl> <dbl>
1 FALSE 0.00551 5.24 14.9
2 TRUE 20.8 1.20 17.6
但是,我想要第三行,它可以在不分组的情况下执行相同的计算。它将计算整个样本的平均值、中值和标准差,而不以组为条件。在 tidyverse
内完成此操作的最简单方法是什么?谢谢!
你可以做一个汇总数据的函数:
summarize_returns = function(data) {
data %>%
summarise(
Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return / 100),
.groups = "drop"
)
}
然后您可以使用 dplyr::bind_rows()
:
data %>%
group_by(FOMC) %>%
summarize_returns() %>%
bind_rows(
data %>% summarize_returns() %>% mutate(FOMC = "Total")
)
# A tibble: 3 x 4
FOMC Mean Median Vol
<chr> <dbl> <dbl> <dbl>
1 FALSE -13.6 -13.3 15.5
2 TRUE 14.4 8.79 16.6
3 Total 0.992 -1.08 16.2
我的数据:
library(tidyverse)
set.seed(123)
data = tibble(
FOMC = as.character(sample(c(TRUE, FALSE), 100, replace = TRUE),
Daily.Return = rnorm(100)
)
一个选项是将 mutate()
变量 FOMC
变量绑定到 "ALL"
的整个数据的副本,这样当你最终将它作为一个单独的组时你 group_by()
和 summarise()
.
library(tidyverse)
set.seed(1)
daily.SPY <- tibble(
FOMC = factor(rep(c(T, F), each = 25)),
Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
)
daily.SPY %>%
bind_rows(., mutate(., FOMC = "ALL")) %>%
group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
#> # A tibble: 3 x 4
#> FOMC Mean Median Vol
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALL 58.4 -6.57 32.3
#> 2 FALSE -80.3 -53.6 13.9
#> 3 TRUE 197. 151. 30.5
由 reprex package (v2.0.1)
创建于 2022-01-11