如何生成摘要统计信息 table,其中所有相关小数位出现在 R 中的结果 table 中?
How to generate a summary statistics table with all relevant decimal places to appear in the resulting table in R?
我有一个非常大的数据集(50 多个站点,100 多个溶质),我想快速生成数据的描述性统计摘要 table 并能够将其导出为 .csv 文件.
示例代码(我数据的一个非常小的子集):
Site <- c( "SC2", "SC2" , "SC2", "SC3" , "SC3" ,"SC3", "SC4", "SC4" ,"SC4","SC4","SC4")
Aluminum <- as.numeric(c(0.0565, 0.0668 ,0.0785,0.0292,0.0576,0.075,0.029,0.088,0.076,0.007,0.107))
Antimony <- as.numeric(c(0.0000578, 0.0000698, 0.0000215,0.000025,0.0000389,0.0000785,0.0000954,0.00005447,0.00007843,0.000025,0.0000124))
stats_data <- data.frame(Site, Aluminum, Antimony, stringsAsFactors=FALSE)
stats_data_gather =stats_data %>% gather(Solute, value, -Site)
table_test = stats_data_gather %>%
group_by(Site, Solute) %>%
get_summary_stats(value, show = c("mean", "sd", "min", "q1", "median", "q3", "max"))
这会生成一个计算所需统计数据的数据框,但结果会被截断到小数点后三位(即应该是 0.00000057 的结果显示为 0.000)。
我试过使用的变体:
options(digits = XX),
format(DF, format = "e", digits = 2),
format.data.frame(table_test, digits = 8)
我已经尝试了这些和其他在网上找到的示例代码,但 none 将重现一个摘要数据框,其中包括小数字结果的所有必要零(即 0.00000057,而不是 0.000)。我什至可以接受科学记数法,但我没有成功找到一个可行的例子。
这是我的第一个 post。我希望我已经提供了足够的帮助细节!
谢谢!
您可以使用 summary
函数获取您要查找的统计数据:
sum.table <- summary(stats_data_gather)
然后您可以使用以下方法从第 3 列获取汇总变量:
as.numeric(sub('.*:', '', sum.table[,3]))
它不起作用,因为在 get_summary_stats
中,它被硬编码为 return 3 位数字:
get_summary_stats
function (data, ..., type = c("full", "common", "robust", "five_number",
"mean_sd", "mean_se", "mean_ci", "median_iqr", "median_mad",
"quantile", "mean", "median", "min", "max"), show = NULL,
probs = seq(0, 1, 0.25))
{
.....
dplyr::mutate_if(is.numeric, round, digits = 3)
if (!is.null(show)) {
show <- unique(c("variable", "n", show))
results <- results %>% select(!!!syms(show))
}
results
}
您可以破解上面的代码,或者您可以使用如下所示的 summarise_all
函数:
library(dplyr)
library(tidyr)
stats_data_gather %>% group_by(Site, Solute) %>% summarise_all(list(~mean(.),~sd(.),
~list(c(summary(.))))) %>% unnest_wider(list)
# A tibble: 6 x 10
# Groups: Site [3]
Site Solute mean sd Min. `1st Qu.` Median Mean `3rd Qu.`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SC2 Alumi… 6.73e-2 1.10e-2 5.65e-2 0.0616 6.68e-2 6.73e-2 0.0726
2 SC2 Antim… 4.97e-5 2.51e-5 2.15e-5 0.0000396 5.78e-5 4.97e-5 0.0000638
3 SC3 Alumi… 5.39e-2 2.31e-2 2.92e-2 0.0434 5.76e-2 5.39e-2 0.0663
4 SC3 Antim… 4.75e-5 2.78e-5 2.50e-5 0.0000320 3.89e-5 4.75e-5 0.0000587
5 SC4 Alumi… 6.14e-2 4.19e-2 7.00e-3 0.029 7.60e-2 6.14e-2 0.088
6 SC4 Antim… 5.31e-5 3.49e-5 1.24e-5 0.000025 5.45e-5 5.31e-5 0.0000784
# … with 1 more variable: Max. <dbl>
列名可能有点糟糕,但您可以轻松地将它们重命名为 q1 和 q3。
我有一个非常大的数据集(50 多个站点,100 多个溶质),我想快速生成数据的描述性统计摘要 table 并能够将其导出为 .csv 文件.
示例代码(我数据的一个非常小的子集):
Site <- c( "SC2", "SC2" , "SC2", "SC3" , "SC3" ,"SC3", "SC4", "SC4" ,"SC4","SC4","SC4")
Aluminum <- as.numeric(c(0.0565, 0.0668 ,0.0785,0.0292,0.0576,0.075,0.029,0.088,0.076,0.007,0.107))
Antimony <- as.numeric(c(0.0000578, 0.0000698, 0.0000215,0.000025,0.0000389,0.0000785,0.0000954,0.00005447,0.00007843,0.000025,0.0000124))
stats_data <- data.frame(Site, Aluminum, Antimony, stringsAsFactors=FALSE)
stats_data_gather =stats_data %>% gather(Solute, value, -Site)
table_test = stats_data_gather %>%
group_by(Site, Solute) %>%
get_summary_stats(value, show = c("mean", "sd", "min", "q1", "median", "q3", "max"))
这会生成一个计算所需统计数据的数据框,但结果会被截断到小数点后三位(即应该是 0.00000057 的结果显示为 0.000)。
我试过使用的变体:
options(digits = XX),
format(DF, format = "e", digits = 2),
format.data.frame(table_test, digits = 8)
我已经尝试了这些和其他在网上找到的示例代码,但 none 将重现一个摘要数据框,其中包括小数字结果的所有必要零(即 0.00000057,而不是 0.000)。我什至可以接受科学记数法,但我没有成功找到一个可行的例子。
这是我的第一个 post。我希望我已经提供了足够的帮助细节! 谢谢!
您可以使用 summary
函数获取您要查找的统计数据:
sum.table <- summary(stats_data_gather)
然后您可以使用以下方法从第 3 列获取汇总变量:
as.numeric(sub('.*:', '', sum.table[,3]))
它不起作用,因为在 get_summary_stats
中,它被硬编码为 return 3 位数字:
get_summary_stats
function (data, ..., type = c("full", "common", "robust", "five_number",
"mean_sd", "mean_se", "mean_ci", "median_iqr", "median_mad",
"quantile", "mean", "median", "min", "max"), show = NULL,
probs = seq(0, 1, 0.25))
{
.....
dplyr::mutate_if(is.numeric, round, digits = 3)
if (!is.null(show)) {
show <- unique(c("variable", "n", show))
results <- results %>% select(!!!syms(show))
}
results
}
您可以破解上面的代码,或者您可以使用如下所示的 summarise_all
函数:
library(dplyr)
library(tidyr)
stats_data_gather %>% group_by(Site, Solute) %>% summarise_all(list(~mean(.),~sd(.),
~list(c(summary(.))))) %>% unnest_wider(list)
# A tibble: 6 x 10
# Groups: Site [3]
Site Solute mean sd Min. `1st Qu.` Median Mean `3rd Qu.`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SC2 Alumi… 6.73e-2 1.10e-2 5.65e-2 0.0616 6.68e-2 6.73e-2 0.0726
2 SC2 Antim… 4.97e-5 2.51e-5 2.15e-5 0.0000396 5.78e-5 4.97e-5 0.0000638
3 SC3 Alumi… 5.39e-2 2.31e-2 2.92e-2 0.0434 5.76e-2 5.39e-2 0.0663
4 SC3 Antim… 4.75e-5 2.78e-5 2.50e-5 0.0000320 3.89e-5 4.75e-5 0.0000587
5 SC4 Alumi… 6.14e-2 4.19e-2 7.00e-3 0.029 7.60e-2 6.14e-2 0.088
6 SC4 Antim… 5.31e-5 3.49e-5 1.24e-5 0.000025 5.45e-5 5.31e-5 0.0000784
# … with 1 more variable: Max. <dbl>
列名可能有点糟糕,但您可以轻松地将它们重命名为 q1 和 q3。