dplyr + 数据集 + 百分位数
dyplr + dataset + percentiles
我正在处理一个大型数据集(918 行 x 17 列),目的是为按月分组的 15 个变量中的每一个找到第 90 个百分位数。我可以使用下面的代码成功地将它设置为 运行,但它不会创建包含所有变量的 table。相反,它为 3 个变量生成 table。有没有办法扩展结果 table 以便显示所有变量?
Data_for_R %>%
group_by(Month) %>%
summarise(percent90_KP = quantile(KP, probs = .9),
percent90_NE = quantile(NE, probs = .9),
percent90_CH = quantile(CH, probs = .9),
percent90_WE = quantile(WE, probs = .9),
percent90_RR = quantile(RR, probs = .9),
percent90_41 = quantile(41, probs = .9),
percent90_PR = quantile(PR, probs = .9),
percent90_31 = quantile(31, probs = .9),
percent90_MC = quantile(MC, probs = .9),
percent90_JH = quantile(JH, probs = .9),
percent90_TD = quantile(TD, probs = .9),
percent90_BO = quantile(BO, probs = .9),
percent90_11 = quantile(11, probs = .9),
percent90_42 = quantile(42, probs = .9),
percent90_20 = quantile(20, probs = .9))
产生以下内容:
A tibble: 5 x 16
Month percent90_KP percent90_NE percent90_CH
<chr> <dbl> <dbl> <dbl>
1 August 19.4 19.3 19.3
2 July 18.6 17.8 17.7
3 June 15.3 15.0 15.0
4 October 17.3 18.6 18.5
5 September 20.1 20.0 19.7
# ... with 12 more variables: percent90_WE <dbl>,
# percent90_RR <dbl>, percent90_41 <dbl>,
# percent90_PR <dbl>, percent90_31 <dbl>,
# percent90_MC <dbl>, percent90_JH <dbl>,
# percent90_TD <dbl>, percent90_BO <dbl>,
# percent90_11 <dbl>, percent90_42 <dbl>,
# percent90_20 <dbl>
如有任何建议,我们将不胜感激。我对 R 和编码还很陌生
这就是输出在控制台中的显示方式。您可以将控制台的大小 window 调整为 increase/decrease 显示的列数。
另一种选择是将输出保存在变量中并将其转换为数据帧。
library(dplyr)
result <- Data_for_R %>% group_by(Month) ....
data.frame(result)
此外,您还可以使用across
来避免对所有列重复相同的分位数命令。
result <- Data_for_R %>%
group_by(Month) %>%
summarise(across(KP:`20`, quantile, probs = .9, .names = 'percent90_{col}'))
我正在处理一个大型数据集(918 行 x 17 列),目的是为按月分组的 15 个变量中的每一个找到第 90 个百分位数。我可以使用下面的代码成功地将它设置为 运行,但它不会创建包含所有变量的 table。相反,它为 3 个变量生成 table。有没有办法扩展结果 table 以便显示所有变量?
Data_for_R %>%
group_by(Month) %>%
summarise(percent90_KP = quantile(KP, probs = .9),
percent90_NE = quantile(NE, probs = .9),
percent90_CH = quantile(CH, probs = .9),
percent90_WE = quantile(WE, probs = .9),
percent90_RR = quantile(RR, probs = .9),
percent90_41 = quantile(41, probs = .9),
percent90_PR = quantile(PR, probs = .9),
percent90_31 = quantile(31, probs = .9),
percent90_MC = quantile(MC, probs = .9),
percent90_JH = quantile(JH, probs = .9),
percent90_TD = quantile(TD, probs = .9),
percent90_BO = quantile(BO, probs = .9),
percent90_11 = quantile(11, probs = .9),
percent90_42 = quantile(42, probs = .9),
percent90_20 = quantile(20, probs = .9))
产生以下内容:
A tibble: 5 x 16
Month percent90_KP percent90_NE percent90_CH
<chr> <dbl> <dbl> <dbl>
1 August 19.4 19.3 19.3
2 July 18.6 17.8 17.7
3 June 15.3 15.0 15.0
4 October 17.3 18.6 18.5
5 September 20.1 20.0 19.7
# ... with 12 more variables: percent90_WE <dbl>,
# percent90_RR <dbl>, percent90_41 <dbl>,
# percent90_PR <dbl>, percent90_31 <dbl>,
# percent90_MC <dbl>, percent90_JH <dbl>,
# percent90_TD <dbl>, percent90_BO <dbl>,
# percent90_11 <dbl>, percent90_42 <dbl>,
# percent90_20 <dbl>
如有任何建议,我们将不胜感激。我对 R 和编码还很陌生
这就是输出在控制台中的显示方式。您可以将控制台的大小 window 调整为 increase/decrease 显示的列数。
另一种选择是将输出保存在变量中并将其转换为数据帧。
library(dplyr)
result <- Data_for_R %>% group_by(Month) ....
data.frame(result)
此外,您还可以使用across
来避免对所有列重复相同的分位数命令。
result <- Data_for_R %>%
group_by(Month) %>%
summarise(across(KP:`20`, quantile, probs = .9, .names = 'percent90_{col}'))