为什么直方图和密度图的比例非常不同?

Why histogram and density plot have very different scales?

对于以下输入:

> dput(x)
c(-0.0147653053444814, -0.0070425841588807, 0.0131104206612625, 
0.0117263843648209, -0.00168116672971042, 0, -0.013869625520111, 
0.00882352941176471, 0, 0, 0, -0.00215066571492998, -0.00090705690270303, 
-0.0101935611953991, -0.041928232841453, -0.0143360618570992, 
-0.0158597406826667, -0.069759508783899, 0, -0.00691005412875734, 
0.0104529616724739, 0.033379345445572, -0.0030681766908812, 0.00768836031944262, 
0.00865218227263988, 0.00613026819923372, 0.0177100021728454, 
0.0046875, 0.000640588259316177, 4.30731238065155e-05, 0, 0.0144230769230769, 
-0.00909090909090909, 0, 0.0182149362477231, -0.0527973921556375, 
-0.03, -0.00372972604269566, -0.00163058986588398, 59.2310883091366, 
-0.000702000702000702, -0.0230591028862859, -0.0138505753759854, 
-42.9370331919076, 0.00794476496736972, 0, 0, -0.0432776764133912, 
-0.0340933304922225, -0.00117702448210923, 0.00366885329405205, 
0.00365368710843237, 0, 0.0186261243333599, 0.0137394903507273, 
0.00939734083792733, 0.0103933978654753, -0.00523429710867398, 
-0.158510355499554, -0.200002339208646, -0.0494646558508539, 
-0.0309613678662747, -0.0121737407329229, -0.018102919444907, 
-0.0312003029155623, 0.0416711882799783, 0.0131474663612084, 
0.0131500298864316, 0.0105307695924028, -0.000253507564750944, 
-0.00153705550016617, -0.000412784909809513, -0.000694444444444444, 
-0.00356895013383563, 0, 0.00600989012174394, 0.00920834890300539, 
-0.00233482719035976, 0, -0.0740233036326251, 0.00934978274363749, 
0.00694444444444444, 0, 0.0014367816091954, 0.0233942414174972, 
0.0190972222222222, -0.0538403614457831, -0.0501207729468599, 
-0.00653869905530757, -0.00603667278718213, -0.00128307939053729, 
0.00692615198120028, 0.0053404905035124, -17.5905959643046, 0.0137565379379523, 
-0.00925925925925926, -0.00873138161273754, -0.00534188034188034, 
0, 0.00288778877887789, 0.00122276567363273)

我试着用这种方式估计它的密度图:

out <- boxplot(x)  # store the boxplot as an object
out$out             # inspect the outliers
hist(x[!x %in% out$out], freq = FALSE, xlab="",ylab="",
     main="Istogramma dell'influenza")
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.1), col="red")

无论如何,我得到了两个比例非常不同的地块:

为什么?我怎样才能在一个图中比较它们?

在这里使用 adjust 参数而不是 bw 可能更容易:

# this uses the default bandwidth
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 1), col="blue")

# this specifies the bandwidth explicitly, close to default
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.0032), col="green")

# this narrower bandwidth is a closer match for the histogram
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 0.6), col="red")

adjust 系数乘以默认带宽,后者使用 complicated rule of thumb。看起来这里的带宽接近0.0032.