密度图产生的曲线太陡

Density plot produces too steep a curve

我有以下向量:

> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837, 
  -0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059, 
   0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046, 
  -1, -0.34239692625979, -0.378787878787879, -1.66260162601626, 
   0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153, 
   0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338, 
   0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182, 
   0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307, 
  -0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121, 
  -0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716, 
   0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215, 
   0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583, 
  -5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661, 
  -3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155, 
  -2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485, 
   0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266, 
  -0.0368893320039882, -0.00990683783542832, -0.0166666666666667, 
  -0.0857142857142857, 0, 0.144337527757217, 0.221153846153846, 
  -0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667, 
   0, 0.0344827586206897, 0.561461794019934, 0.458333333333333, 
  -1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012, 
  -0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137, 
   0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128, 
   0, 0.0693069306930693, 0.0293463761671854)

我绘制了x的密度,这样:

d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")

获得

0 处的峰值非常窄,在它之外曲线是平坦的。因此,该图变得不清楚。我曾想过使用对数刻度来使峰值不那么陡峭,并提高图形的可读性,但是零点太多了。

看看更相关的 x 范围如何?


x2 <- x[ x > -3 ]

d0<-density(x2)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)

plot(
    df_density0$x,
    df_density0$y,
    type="l",
    col="red"
)

然后添加一条注释,说明此图未说明 104 个测量值中的 6 个,范围从等等……等等……

曲线如此陡峭是因为您的数据中有一些极端异常值。您可以通过 boxplot 处理数据并将结果存储为对象来删除它们(假设您的数据被称为 dt):

out <- boxplot(dt)  # store the boxplot as an object
out$out             # inspect the outliers
 [1]    -1.0000000    -1.6626016    -1.2671374     0.9871848     0.9869751 -1030.4887966
 [7]    -1.0415494    -5.5555643    -3.7619351    -4.7500000    -1.1022374    -2.0478261
[13]    -3.0452128     0.9896907    -1.8000000    -1.2921687    -1.2028986  -481.3959762

您可以从 dt 中删除异常值并使用 hist 再次绘制(请注意,如果要添加密度线,则 freq 必须设置为 FALSE)以及叠加一条密度线(使用 bw 确定密度曲线的形状):

hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))