从分箱数据生成直方图和密度图

Generating a histogram and density plot from binned data

我对一些数据进行了分箱,目前有一个由两列组成的数据框,一列指定分箱范围,另一列指定频率,如下所示:-

> head(data)
      binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16

我想用它来绘制直方图和密度图,但我似乎找不到无需生成新箱等的方法。使用此解决方案 here 我尝试这样做以下:-

p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")

但是它崩溃了。有人知道如何处理吗?

谢谢

问题是 ggplot 不理解你输入的数据,你需要像这样重塑它(我不是正则表达式大师,所以肯定有更好的方法是):

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")

# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")

# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
    geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))

或者,如果您不希望以数字方式解释数据,您可以简单地执行以下操作:

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")

你将无法用你的数据绘制密度图,因为它不是连续的而是分类的,这就是为什么我实际上更喜欢第二种显示方式的原因,

你可以试试

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()