为什么直方图和密度图的比例非常不同?
Why histogram and density plot have very different scales?
对于以下输入:
> dput(x)
c(-0.0147653053444814, -0.0070425841588807, 0.0131104206612625,
0.0117263843648209, -0.00168116672971042, 0, -0.013869625520111,
0.00882352941176471, 0, 0, 0, -0.00215066571492998, -0.00090705690270303,
-0.0101935611953991, -0.041928232841453, -0.0143360618570992,
-0.0158597406826667, -0.069759508783899, 0, -0.00691005412875734,
0.0104529616724739, 0.033379345445572, -0.0030681766908812, 0.00768836031944262,
0.00865218227263988, 0.00613026819923372, 0.0177100021728454,
0.0046875, 0.000640588259316177, 4.30731238065155e-05, 0, 0.0144230769230769,
-0.00909090909090909, 0, 0.0182149362477231, -0.0527973921556375,
-0.03, -0.00372972604269566, -0.00163058986588398, 59.2310883091366,
-0.000702000702000702, -0.0230591028862859, -0.0138505753759854,
-42.9370331919076, 0.00794476496736972, 0, 0, -0.0432776764133912,
-0.0340933304922225, -0.00117702448210923, 0.00366885329405205,
0.00365368710843237, 0, 0.0186261243333599, 0.0137394903507273,
0.00939734083792733, 0.0103933978654753, -0.00523429710867398,
-0.158510355499554, -0.200002339208646, -0.0494646558508539,
-0.0309613678662747, -0.0121737407329229, -0.018102919444907,
-0.0312003029155623, 0.0416711882799783, 0.0131474663612084,
0.0131500298864316, 0.0105307695924028, -0.000253507564750944,
-0.00153705550016617, -0.000412784909809513, -0.000694444444444444,
-0.00356895013383563, 0, 0.00600989012174394, 0.00920834890300539,
-0.00233482719035976, 0, -0.0740233036326251, 0.00934978274363749,
0.00694444444444444, 0, 0.0014367816091954, 0.0233942414174972,
0.0190972222222222, -0.0538403614457831, -0.0501207729468599,
-0.00653869905530757, -0.00603667278718213, -0.00128307939053729,
0.00692615198120028, 0.0053404905035124, -17.5905959643046, 0.0137565379379523,
-0.00925925925925926, -0.00873138161273754, -0.00534188034188034,
0, 0.00288778877887789, 0.00122276567363273)
我试着用这种方式估计它的密度图:
out <- boxplot(x) # store the boxplot as an object
out$out # inspect the outliers
hist(x[!x %in% out$out], freq = FALSE, xlab="",ylab="",
main="Istogramma dell'influenza")
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.1), col="red")
无论如何,我得到了两个比例非常不同的地块:
为什么?我怎样才能在一个图中比较它们?
在这里使用 adjust
参数而不是 bw
可能更容易:
# this uses the default bandwidth
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 1), col="blue")
# this specifies the bandwidth explicitly, close to default
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.0032), col="green")
# this narrower bandwidth is a closer match for the histogram
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 0.6), col="red")
adjust
系数乘以默认带宽,后者使用 complicated rule of thumb。看起来这里的带宽接近0.0032.
对于以下输入:
> dput(x)
c(-0.0147653053444814, -0.0070425841588807, 0.0131104206612625,
0.0117263843648209, -0.00168116672971042, 0, -0.013869625520111,
0.00882352941176471, 0, 0, 0, -0.00215066571492998, -0.00090705690270303,
-0.0101935611953991, -0.041928232841453, -0.0143360618570992,
-0.0158597406826667, -0.069759508783899, 0, -0.00691005412875734,
0.0104529616724739, 0.033379345445572, -0.0030681766908812, 0.00768836031944262,
0.00865218227263988, 0.00613026819923372, 0.0177100021728454,
0.0046875, 0.000640588259316177, 4.30731238065155e-05, 0, 0.0144230769230769,
-0.00909090909090909, 0, 0.0182149362477231, -0.0527973921556375,
-0.03, -0.00372972604269566, -0.00163058986588398, 59.2310883091366,
-0.000702000702000702, -0.0230591028862859, -0.0138505753759854,
-42.9370331919076, 0.00794476496736972, 0, 0, -0.0432776764133912,
-0.0340933304922225, -0.00117702448210923, 0.00366885329405205,
0.00365368710843237, 0, 0.0186261243333599, 0.0137394903507273,
0.00939734083792733, 0.0103933978654753, -0.00523429710867398,
-0.158510355499554, -0.200002339208646, -0.0494646558508539,
-0.0309613678662747, -0.0121737407329229, -0.018102919444907,
-0.0312003029155623, 0.0416711882799783, 0.0131474663612084,
0.0131500298864316, 0.0105307695924028, -0.000253507564750944,
-0.00153705550016617, -0.000412784909809513, -0.000694444444444444,
-0.00356895013383563, 0, 0.00600989012174394, 0.00920834890300539,
-0.00233482719035976, 0, -0.0740233036326251, 0.00934978274363749,
0.00694444444444444, 0, 0.0014367816091954, 0.0233942414174972,
0.0190972222222222, -0.0538403614457831, -0.0501207729468599,
-0.00653869905530757, -0.00603667278718213, -0.00128307939053729,
0.00692615198120028, 0.0053404905035124, -17.5905959643046, 0.0137565379379523,
-0.00925925925925926, -0.00873138161273754, -0.00534188034188034,
0, 0.00288778877887789, 0.00122276567363273)
我试着用这种方式估计它的密度图:
out <- boxplot(x) # store the boxplot as an object
out$out # inspect the outliers
hist(x[!x %in% out$out], freq = FALSE, xlab="",ylab="",
main="Istogramma dell'influenza")
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.1), col="red")
无论如何,我得到了两个比例非常不同的地块:
为什么?我怎样才能在一个图中比较它们?
在这里使用 adjust
参数而不是 bw
可能更容易:
# this uses the default bandwidth
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 1), col="blue")
# this specifies the bandwidth explicitly, close to default
lines(density(x[!x %in% out$out], kernel="cosine", bw = 0.0032), col="green")
# this narrower bandwidth is a closer match for the histogram
lines(density(x[!x %in% out$out], kernel="cosine", adjust = 0.6), col="red")
adjust
系数乘以默认带宽,后者使用 complicated rule of thumb。看起来这里的带宽接近0.0032.