使用 skimr 创建汇总统计的数据框
Using skimr to create a data frame of summary statistics
我最近遇到了名为 skimr
的包,它有助于创建有用的汇总统计信息。我编写了以下代码以仅在数字列上提取摘要统计信息。我的第一个问题是,是否有更直接的方式让 skimr 允许指定我想要汇总统计信息的变量类型?我的第二个问题是,当我写 my_skim
“闭包”时 append == TRUE
实际实现了什么?
library(skimr)
library(dplyr)
### Creating an example dataset
test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE),
"Firm" = head(LETTERS, 5),
"Exporter"= sample(c("Yes", "No"), 20, replace = TRUE),
"Revenue" = sample(100:200, 20, replace = TRUE),
stringsAsFactors = FALSE)
test.df1 <- rbind(test.df1,
data.frame("Year" = c(2018, 2018),
"Firm" = c("Y", "Z"),
"Exporter" = c("Yes", "No"),
"Revenue" = c(NA, NA)))
test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))
### Using skimr package to extract summary stats
my_skim <- skim_with(numeric = sfl(minimum = min, maximum = max, hist = NULL), append = TRUE)
test.df1_skim1 <- test.df1 %>%
group_by(Year) %>%
my_skim() %>%
filter (skim_type != "character") %>%
select(-starts_with("character"))
如果您只想要数字变量的摘要,您可以将所有其他类型设置为 NULL,否则您可以 运行 略读并使用 yank()
获取类型的子表。
来自 https://docs.ropensci.org/skimr/articles/skimr.html#reshaping-the-results-from-skim-
skim(Orange) %>% yank("numeric")
追加选项允许您替换默认统计信息或追加到默认值。
我最近遇到了名为 skimr
的包,它有助于创建有用的汇总统计信息。我编写了以下代码以仅在数字列上提取摘要统计信息。我的第一个问题是,是否有更直接的方式让 skimr 允许指定我想要汇总统计信息的变量类型?我的第二个问题是,当我写 my_skim
“闭包”时 append == TRUE
实际实现了什么?
library(skimr)
library(dplyr)
### Creating an example dataset
test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE),
"Firm" = head(LETTERS, 5),
"Exporter"= sample(c("Yes", "No"), 20, replace = TRUE),
"Revenue" = sample(100:200, 20, replace = TRUE),
stringsAsFactors = FALSE)
test.df1 <- rbind(test.df1,
data.frame("Year" = c(2018, 2018),
"Firm" = c("Y", "Z"),
"Exporter" = c("Yes", "No"),
"Revenue" = c(NA, NA)))
test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))
### Using skimr package to extract summary stats
my_skim <- skim_with(numeric = sfl(minimum = min, maximum = max, hist = NULL), append = TRUE)
test.df1_skim1 <- test.df1 %>%
group_by(Year) %>%
my_skim() %>%
filter (skim_type != "character") %>%
select(-starts_with("character"))
如果您只想要数字变量的摘要,您可以将所有其他类型设置为 NULL,否则您可以 运行 略读并使用 yank()
获取类型的子表。
来自 https://docs.ropensci.org/skimr/articles/skimr.html#reshaping-the-results-from-skim-
skim(Orange) %>% yank("numeric")
追加选项允许您替换默认统计信息或追加到默认值。