从分箱数据生成直方图和密度图
Generating a histogram and density plot from binned data
我对一些数据进行了分箱,目前有一个由两列组成的数据框,一列指定分箱范围,另一列指定频率,如下所示:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
我想用它来绘制直方图和密度图,但我似乎找不到无需生成新箱等的方法。使用此解决方案 here 我尝试这样做以下:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
但是它崩溃了。有人知道如何处理吗?
谢谢
问题是 ggplot 不理解你输入的数据,你需要像这样重塑它(我不是正则表达式大师,所以肯定有更好的方法是):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
或者,如果您不希望以数字方式解释数据,您可以简单地执行以下操作:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
你将无法用你的数据绘制密度图,因为它不是连续的而是分类的,这就是为什么我实际上更喜欢第二种显示方式的原因,
你可以试试
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()
我对一些数据进行了分箱,目前有一个由两列组成的数据框,一列指定分箱范围,另一列指定频率,如下所示:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
我想用它来绘制直方图和密度图,但我似乎找不到无需生成新箱等的方法。使用此解决方案 here 我尝试这样做以下:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
但是它崩溃了。有人知道如何处理吗?
谢谢
问题是 ggplot 不理解你输入的数据,你需要像这样重塑它(我不是正则表达式大师,所以肯定有更好的方法是):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
或者,如果您不希望以数字方式解释数据,您可以简单地执行以下操作:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
你将无法用你的数据绘制密度图,因为它不是连续的而是分类的,这就是为什么我实际上更喜欢第二种显示方式的原因,
你可以试试
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()