用于创建频率的嵌套剪切函数 table

Nested cut function for create a frequency table

我正在执行一个频率 table,例如来自空气质量数据集。代码下方:

attach(airquality)
airquality <- airquality
breaks = seq(1.7, 20.7, by=3.8)
airquality.split = cut(airquality$Wind, breaks, right=FALSE)
airquality.freq = table(airquality.split)
airquality.dist = cbind(airquality.freq,100*airquality.freq/sum(airquality.freq),
       cumsum(airquality.freq), 100*cumsum(airquality.freq)/sum(airquality.freq))
colnames(airquality.dist) = c('Frequency','Percentage', 'Cum.Frequency','Cum.Percentage')

我想做同样的操作,但要考虑因素 Month。我的意思是获取一个完整的数据框,其中嵌套了每个月的 Wind 变量的频率,从而创建一个直方图。

Month                           Frequency Percentage Cum.Frequency Cum.Percentage
Month 1          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 1          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 1          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 1          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 1          [16.9,20.7)        [...]  [...]           [...]       [...]
Month 2          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 2          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 2          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 2          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 2          [16.9,20.7)        [...]  [...]           [...]       [...]

[...]

根据这些数据,我想制作一个具有相同颜色的不同系列 month 的直方图,并在一个月内显示百分比(或频率)的五列。是否可以直接使用 cut 函数来实现?

提前致谢。

使用 cut 您可以将 Wind 分成不同的组,并使用 prop.table.

为每个 Month 计算比率
library(dplyr)

airquality %>%
  count(Month, group = cut(Wind, breaks, right=FALSE), name = 'Frequency') %>%
  group_by(Month) %>%
  mutate(Percentage = prop.table(Frequency) * 100, 
         Cum.Frequency = cumsum(Frequency), 
         Cum.Percentage = Cum.Frequency/max(Cum.Frequency) * 100) %>%
  ungroup