小提琴图:相邻取值范围是如何确定的,为什么与箱线图不同?
Violin plot: How is the adjacent value range determined, and why is it different from boxplot?
理论上vioplot包的violinplot是箱线图+密度函数。
在"boxplot part"、
黑框对应IQR(确实见下图),
中线应对应于相同的范围(相邻值,默认 1.5 IQR),但事实并非如此(见下文)。任何人都可以解释为什么它们不同?
require("vioplot")
a = rnorm(100)
range (a)
a = c(a,2,8,2.9,3,4, -3, -5) # add some outliers
par ( mfrow = c(1,2))
boxplot(a, range=1.5)
vioplot(a, range=1.5 )
由以上所获:
让我用一个简单的例子来说明这一点:
b <- c(1:10, 20)
par(mfrow = c(1,2))
boxplot(b, range=1.5)
vioplot(b, range=1.5 )
R的箱线图的定义是(借用ggplot's help的话题):
The upper whisker extends from the hinge to the highest value that is within
1.5 * IQR of the hinge, where IQR is the inter-quartile range, or distance
between the first and third quartiles.
浏览vioplot的source code,我们看到upper[i] <- min(q3[i] + range*iqd, data.max)
。
因此,让我们尝试重现上胡须值:
# vioplot draws
quantile(b, 0.75) + 1.5 * IQR(b)
# 16
# boxplot draws
max(b[b <= quantile(b, 0.75) + 1.5 * IQR(b)])
# 10
理论上vioplot包的violinplot是箱线图+密度函数。
在"boxplot part"、
黑框对应IQR(确实见下图),
中线应对应于相同的范围(相邻值,默认 1.5 IQR),但事实并非如此(见下文)。任何人都可以解释为什么它们不同?
require("vioplot") a = rnorm(100) range (a) a = c(a,2,8,2.9,3,4, -3, -5) # add some outliers par ( mfrow = c(1,2)) boxplot(a, range=1.5) vioplot(a, range=1.5 )
由以上所获:
让我用一个简单的例子来说明这一点:
b <- c(1:10, 20)
par(mfrow = c(1,2))
boxplot(b, range=1.5)
vioplot(b, range=1.5 )
R的箱线图的定义是(借用ggplot's help的话题):
The upper whisker extends from the hinge to the highest value that is within 1.5 * IQR of the hinge, where IQR is the inter-quartile range, or distance between the first and third quartiles.
浏览vioplot的source code,我们看到upper[i] <- min(q3[i] + range*iqd, data.max)
。
因此,让我们尝试重现上胡须值:
# vioplot draws
quantile(b, 0.75) + 1.5 * IQR(b)
# 16
# boxplot draws
max(b[b <= quantile(b, 0.75) + 1.5 * IQR(b)])
# 10