密度图产生的曲线太陡
Density plot produces too steep a curve
我有以下向量:
> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837,
-0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059,
0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046,
-1, -0.34239692625979, -0.378787878787879, -1.66260162601626,
0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153,
0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338,
0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182,
0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307,
-0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121,
-0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716,
0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215,
0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583,
-5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661,
-3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155,
-2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485,
0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266,
-0.0368893320039882, -0.00990683783542832, -0.0166666666666667,
-0.0857142857142857, 0, 0.144337527757217, 0.221153846153846,
-0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667,
0, 0.0344827586206897, 0.561461794019934, 0.458333333333333,
-1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012,
-0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137,
0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128,
0, 0.0693069306930693, 0.0293463761671854)
我绘制了x
的密度,这样:
d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")
获得
0 处的峰值非常窄,在它之外曲线是平坦的。因此,该图变得不清楚。我曾想过使用对数刻度来使峰值不那么陡峭,并提高图形的可读性,但是零点太多了。
看看更相关的 x 范围如何?
x2 <- x[ x > -3 ]
d0<-density(x2)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(
df_density0$x,
df_density0$y,
type="l",
col="red"
)
然后添加一条注释,说明此图未说明 104 个测量值中的 6 个,范围从等等……等等……
曲线如此陡峭是因为您的数据中有一些极端异常值。您可以通过 boxplot
处理数据并将结果存储为对象来删除它们(假设您的数据被称为 dt
):
out <- boxplot(dt) # store the boxplot as an object
out$out # inspect the outliers
[1] -1.0000000 -1.6626016 -1.2671374 0.9871848 0.9869751 -1030.4887966
[7] -1.0415494 -5.5555643 -3.7619351 -4.7500000 -1.1022374 -2.0478261
[13] -3.0452128 0.9896907 -1.8000000 -1.2921687 -1.2028986 -481.3959762
您可以从 dt
中删除异常值并使用 hist
再次绘制(请注意,如果要添加密度线,则 freq
必须设置为 FALSE)以及叠加一条密度线(使用 bw
确定密度曲线的形状):
hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))
我有以下向量:
> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837,
-0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059,
0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046,
-1, -0.34239692625979, -0.378787878787879, -1.66260162601626,
0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153,
0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338,
0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182,
0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307,
-0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121,
-0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716,
0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215,
0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583,
-5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661,
-3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155,
-2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485,
0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266,
-0.0368893320039882, -0.00990683783542832, -0.0166666666666667,
-0.0857142857142857, 0, 0.144337527757217, 0.221153846153846,
-0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667,
0, 0.0344827586206897, 0.561461794019934, 0.458333333333333,
-1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012,
-0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137,
0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128,
0, 0.0693069306930693, 0.0293463761671854)
我绘制了x
的密度,这样:
d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")
获得
0 处的峰值非常窄,在它之外曲线是平坦的。因此,该图变得不清楚。我曾想过使用对数刻度来使峰值不那么陡峭,并提高图形的可读性,但是零点太多了。
看看更相关的 x 范围如何?
x2 <- x[ x > -3 ]
d0<-density(x2)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(
df_density0$x,
df_density0$y,
type="l",
col="red"
)
然后添加一条注释,说明此图未说明 104 个测量值中的 6 个,范围从等等……等等……
曲线如此陡峭是因为您的数据中有一些极端异常值。您可以通过 boxplot
处理数据并将结果存储为对象来删除它们(假设您的数据被称为 dt
):
out <- boxplot(dt) # store the boxplot as an object
out$out # inspect the outliers
[1] -1.0000000 -1.6626016 -1.2671374 0.9871848 0.9869751 -1030.4887966
[7] -1.0415494 -5.5555643 -3.7619351 -4.7500000 -1.1022374 -2.0478261
[13] -3.0452128 0.9896907 -1.8000000 -1.2921687 -1.2028986 -481.3959762
您可以从 dt
中删除异常值并使用 hist
再次绘制(请注意,如果要添加密度线,则 freq
必须设置为 FALSE)以及叠加一条密度线(使用 bw
确定密度曲线的形状):
hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))