如何向ggplot直方图添加均值和众数?
How to add mean, and mode to ggplot histogram?
我需要为这种类型添加一条平均线和众数的值
地块数:
我用它来计算垃圾箱的数量:
bw <- diff(range(cars$lenght)) / (2 * IQR(cars$lenght) / length(cars$lenght)^(1/3))
剧情:
ggplot(data=cars, aes(cars$lenght)) +
geom_histogram(aes(y =..density..),
col="red",
binwidth = bw,
fill="green",
alpha=1) +
geom_density(col=4) +
labs(title='Lenght Plot', x='Lenght', y='Times')
cars$lenght
168.8 168.8 171.2 176.6 176.6 177.3 192.7 192.7 192.7 178.2
176.8 176.8 176.8 176.8 189.0 189.0 193.8 197.0 141.1 155.9
158.8 157.3 157.3 157.3 157.3 157.3 157.3 157.3 174.6 173.2
提前致谢。
我不确定如何复制你的数据,所以我用 cars$speed
代替它。
geom_vline
将在您想要的位置放置垂直线,您可以即时计算原始数据的均值和众数。但是如果你想要模式作为具有最高频率的直方图 bin,你可以从 ggplot 对象中提取它。
我不太确定你想如何定义模式,所以我设计了一堆不同的方法。
# function to calculate mode
fun.mode<-function(x){as.numeric(names(sort(-table(x)))[1])}
bw <- diff(range(cars$length)) / (2 * IQR(cars$speed) / length(cars$speed)^(1/3))
p<-ggplot(data=cars, aes(cars$speed)) +
geom_histogram(aes(y =..density..),
col="red",
binwidth = bw,
fill="green",
alpha=1) +
geom_density(col=4) +
labs(title='Lenght Plot', x='Lenght', y='Times')
# Extract data for the histogram and density peaks
data<-ggplot_build(p)$data
hist_peak<-data[[1]]%>%filter(y==max(y))%>%.$x
dens_peak<-data[[2]]%>%filter(y==max(y))%>%.$x
# plot mean, mode, histogram peak and density peak
p%+%
geom_vline(aes(xintercept = mean(speed)),col='red',size=2)+
geom_vline(aes(xintercept = fun.mode(speed)),col='blue',size=2)+
geom_vline(aes(xintercept = hist_peak),col='orange',size=2)+
geom_vline(aes(xintercept = dens_peak),col='purple',size=2)+
geom_text(aes(label=round(hist_peak,1),y=0,x=hist_peak),
vjust=-1,col='orange',size=5)
创建一个 data.frame,其中包含您要绘制的每个统计数据的值。这具有为每个统计信息自动创建图例的优点。
cars$length <- cars$speed
bw <- diff(range(cars$length)) / (2 * IQR(cars$length) / length(cars$length)^(1/3))
sumstatz <- data.frame(whichstat = c("mean",
"sd upr",
"sd lwr"),
value = c(mean(cars$length),
mean(cars$length)+sd(cars$length),
mean(cars$length)-sd(cars$length)))
ggplot(data=cars, aes(length)) +
geom_histogram(aes(y =..density..),
col="black",
binwidth = bw) +
geom_density(col="black") +
geom_vline(data=sumstatz,aes(xintercept = value,
linetype = whichstat,
col = whichstat),size=1)+
labs(title='Length Plot', x='Length', y='Count')
我需要为这种类型添加一条平均线和众数的值 地块数:
我用它来计算垃圾箱的数量:
bw <- diff(range(cars$lenght)) / (2 * IQR(cars$lenght) / length(cars$lenght)^(1/3))
剧情:
ggplot(data=cars, aes(cars$lenght)) +
geom_histogram(aes(y =..density..),
col="red",
binwidth = bw,
fill="green",
alpha=1) +
geom_density(col=4) +
labs(title='Lenght Plot', x='Lenght', y='Times')
cars$lenght
168.8 168.8 171.2 176.6 176.6 177.3 192.7 192.7 192.7 178.2 176.8 176.8 176.8 176.8 189.0 189.0 193.8 197.0 141.1 155.9 158.8 157.3 157.3 157.3 157.3 157.3 157.3 157.3 174.6 173.2
提前致谢。
我不确定如何复制你的数据,所以我用 cars$speed
代替它。
geom_vline
将在您想要的位置放置垂直线,您可以即时计算原始数据的均值和众数。但是如果你想要模式作为具有最高频率的直方图 bin,你可以从 ggplot 对象中提取它。
我不太确定你想如何定义模式,所以我设计了一堆不同的方法。
# function to calculate mode
fun.mode<-function(x){as.numeric(names(sort(-table(x)))[1])}
bw <- diff(range(cars$length)) / (2 * IQR(cars$speed) / length(cars$speed)^(1/3))
p<-ggplot(data=cars, aes(cars$speed)) +
geom_histogram(aes(y =..density..),
col="red",
binwidth = bw,
fill="green",
alpha=1) +
geom_density(col=4) +
labs(title='Lenght Plot', x='Lenght', y='Times')
# Extract data for the histogram and density peaks
data<-ggplot_build(p)$data
hist_peak<-data[[1]]%>%filter(y==max(y))%>%.$x
dens_peak<-data[[2]]%>%filter(y==max(y))%>%.$x
# plot mean, mode, histogram peak and density peak
p%+%
geom_vline(aes(xintercept = mean(speed)),col='red',size=2)+
geom_vline(aes(xintercept = fun.mode(speed)),col='blue',size=2)+
geom_vline(aes(xintercept = hist_peak),col='orange',size=2)+
geom_vline(aes(xintercept = dens_peak),col='purple',size=2)+
geom_text(aes(label=round(hist_peak,1),y=0,x=hist_peak),
vjust=-1,col='orange',size=5)
创建一个 data.frame,其中包含您要绘制的每个统计数据的值。这具有为每个统计信息自动创建图例的优点。
cars$length <- cars$speed
bw <- diff(range(cars$length)) / (2 * IQR(cars$length) / length(cars$length)^(1/3))
sumstatz <- data.frame(whichstat = c("mean",
"sd upr",
"sd lwr"),
value = c(mean(cars$length),
mean(cars$length)+sd(cars$length),
mean(cars$length)-sd(cars$length)))
ggplot(data=cars, aes(length)) +
geom_histogram(aes(y =..density..),
col="black",
binwidth = bw) +
geom_density(col="black") +
geom_vline(data=sumstatz,aes(xintercept = value,
linetype = whichstat,
col = whichstat),size=1)+
labs(title='Length Plot', x='Length', y='Count')