在 3d 核密度中创建 %-contour 并找出哪些点在该轮廓内
Create %-contour in a 3d kernel density and find which points are within that contour
我想在 3d 核密度估计中绘制特定 %-contour 的等值面。然后,我想知道哪些点在那个 3d 形状内。
我将展示我接近 2d 的情况来说明我的问题(代码模仿 and How to plot a contour line showing where 95% of values fall within, in R and in ggplot2)。
library(MASS)
library(misc3d)
library(rgl)
library(sp)
# Create dataset
set.seed(42)
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(400, c(100, 100, 100),Sigma))
### 2d ###
# Create kernel density
dens2d <- kde2d(mv[, 1], mv[, 2], n = 40)
# Find the contour level defined in prob
dx <- diff(dens2d$x[1:2])
dy <- diff(dens2d$y[1:2])
sd <- sort(dens2d$z)
c1 <- cumsum(sd) * dx * dy
prob <- .5
levels <- sapply(prob, function(x) {
approx(c1, sd, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
ls <- contourLines(dens2d, level = levels)
pinp <- point.in.polygon(mv[, 1], mv[, 2], ls[[1]]$x, ls[[1]]$y)
# Plot it
plot(mv[, 1], mv[, 2], pch = 21, bg = "gray")
contour(dens2d, levels = levels, labels = prob,
add = T, col = "red")
points(mv[pinp == 1, 1], mv[pinp == 1, 2], pch = 21, bg = "orange")
因此,50% 的轮廓是使用近似值定义的,轮廓是使用轮廓线创建的,然后 point.in.polygon 找到该轮廓内的点。
我想做同样的事情,但在 3d 情况下。这是我所管理的:
### 3d ###
# Create kernel density
dens3d <- kde3d(mv[,1], mv[,2], mv[,3], n = 40)
# Find the contour level defined in prob
dx <- diff(dens3d$x[1:2])
dy <- diff(dens3d$y[1:2])
dz <- diff(dens3d$z[1:2])
sd3d <- sort(dens3d$d)
c3d <- cumsum(sd3d) * dx * dy * dz
levels <- sapply(prob, function(x) {
approx(c3d, sd3d, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
# # No idea
# Plot it
points3d(mv[,1], mv[,2], mv[,3], size = 2)
box3d(col = "gray")
contour3d(dens3d$d, level = levels, x = dens3d$x, y = dens3d$y, z = dens3d$z, #exp(-12)
alpha = .3, color = "red", color2 = "gray", add = TRUE)
title3d(xlab = "x", ylab = "y", zlab = "z")
所以,我还没走多远。
我意识到我在 3d 情况下定义关卡的方式不正确,我猜问题出在 c3d <- cumsum(sd3d) * dx * dy * dz
但老实说我不知道如何继续。
并且,一旦正确定义了 3d 轮廓,我将不胜感激任何关于如何处理轮廓内的点的提示。
非常感谢!
编辑:根据 user2554330 的建议,我将编辑我的问题以添加测试代码,将他或她的建议与我在此处发布的建议进行比较。 (我确实意识到使用轮廓作为新数据点的推断的目的不在原始问题中,我对此修正表示歉意。)
此外,我在下面的评论中有些仓促。这两种方法在 2D 情况下的表现如何取决于样本有多大。在样本 n = 48 左右时,user2554330 的方法捕获了大约 69% 的人口(而我发布的方法捕获了大约 79%),但是在样本 n = 400 左右时,user2554330 的方法捕获了大约 79%(对比 83% ).
# Load libraries
library(MASS)
library(misc3d)
library(rgl)
library(sp)
library(oce)
library(akima)
# Create dataset
set.seed(42)
tn <- 1000 # number in pop
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(tn, c(100, 100, 100),Sigma)) # population
prob <- .8 # rather than .5
simn <- 100 # number of simulations
pinp <- rep(NA, simn)
cuts <- pinp
sn <- 48 # sample size, at n = 400 user2554330 performs better
### 2d scenario
for (isim in 1:simn) {
# Sample
smv <- mv[sample(1:tn, sn), ]
# Create kernel density
dens2d <- kde2d(smv[, 1], smv[, 2], n = 40,
lims = c(min(smv[, 1]) - abs(max(smv[, 1]) - min(smv[, 1])) / 2,
max(smv[, 1]) + abs(max(smv[, 1]) - min(smv[, 1])) / 2,
min(smv[, 2]) - abs(max(smv[, 2]) - min(smv[, 2])) / 2,
max(smv[, 2]) + abs(max(smv[, 2]) - min(smv[, 2])) / 2))
# Approach based on
# Find the contour level defined in prob
dx <- diff(dens2d$x[1:2])
dy <- diff(dens2d$y[1:2])
sd <- sort(dens2d$z)
c1 <- cumsum(sd) * dx * dy
levels <- sapply(prob, function(x) {
approx(c1, sd, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
ls <- contourLines(dens2d, level = levels)
# Note below that I check points from "population"
pinp[isim] <- sum(point.in.polygon(mv[, 1], mv[, 2], ls[[1]]$x, ls[[1]]$y)) / tn
# Approach based on user2554330
# Find the estimated density at each observed point
sdatadensity<- bilinear(dens2d$x, dens2d$y, dens2d$z,
smv[,1], smv[,2])$z
# Find the contours
levels2 <- quantile(sdatadensity, probs = 1- prob, na.rm = TRUE)
# Find within
# Note below that I check points from "population"
datadensity <- bilinear(dens2d$x, dens2d$y, dens2d$z,
mv[,1], mv[,2])$z
cuts[isim] <- sum(as.numeric(cut(datadensity, c(0, levels2, Inf))) == 2, na.rm = T) / tn
}
summary(pinp)
summary(cuts)
> summary(pinp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0030 0.7800 0.8205 0.7950 0.8565 0.9140
> summary(cuts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5350 0.6560 0.6940 0.6914 0.7365 0.8120
我还尝试使用以下代码查看 user2554330 的方法在 3D 情况下的表现:
# 3d scenario
for (isim in 1:simn) {
# Sample
smv <- mv[sample(1:tn, sn), ]
# Create kernel density
dens3d <- kde3d(smv[,1], smv[,2], smv[,3], n = 40,
lims = c(min(smv[, 1]) - abs(max(smv[, 1]) - min(smv[, 1])) / 2,
max(smv[, 1]) + abs(max(smv[, 1]) - min(smv[, 1])) / 2,
min(smv[, 2]) - abs(max(smv[, 2]) - min(smv[, 2])) / 2,
max(smv[, 2]) + abs(max(smv[, 2]) - min(smv[, 2])) / 2,
min(smv[, 3]) - abs(max(smv[, 3]) - min(smv[, 3])) / 2,
max(smv[, 3]) + abs(max(smv[, 3]) - min(smv[, 3])) / 2))
# Approach based on user2554330
# Find the estimated density at each observed point
sdatadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
smv[,1], smv[,2], smv[,3])
# Find the contours
levels <- quantile(sdatadensity, probs = 1 - prob, na.rm = TRUE)
# Find within
# Note below that I check points from "population"
datadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
mv[,1], mv[,2], mv[,3])
cuts[isim] <- sum(as.numeric(cut(datadensity, c(0, levels, Inf))) == 2, na.rm = T) / tn
}
summary(cuts)
> summary(cuts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1220 0.1935 0.2285 0.2304 0.2620 0.3410
我更愿意定义轮廓,即使样本 n 相对较小(即 < 50),指定的概率也是(接近)捕获从同一人群中提取的未来数据点的概率。
与其试图找出哪些点在等高线内,不如尝试评估每个点的密度,并根据该值与等高线水平的比较来为这些点着色。对于边界附近的几个点可能会做出不同的决定,但应该非常接近。
要进行该评估,您可以对密度估计使用 oce::approx3d
函数。
我要做的另一件事是根据观察到的密度的分位数选择轮廓,而不是尝试模拟估计密度的 3-d 积分。
这是执行所有这些操作的代码:
library(MASS)
library(misc3d)
library(rgl)
library(oce)
#> Loading required package: testthat
#> Loading required package: gsw
# Create dataset
set.seed(42)
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(400, c(100, 100, 100),Sigma))
### 3d ###
# Create kernel density
dens3d <- kde3d(mv[,1], mv[,2], mv[,3], n = 40)
# Find the estimated density at each observed point
datadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
mv[,1], mv[,2], mv[,3])
# Find the contours
prob <- .5
levels <- quantile(datadensity, probs = prob, na.rm = TRUE)
# Plot it
colours <- c("gray", "orange")
cuts <- cut(datadensity, c(0, levels, Inf))
for (i in seq_along(levels(cuts))) {
gp <- as.numeric(cuts) == i
spheres3d(mv[gp,1], mv[gp,2], mv[gp,3], col = colours[i], radius = 0.2)
}
box3d(col = "gray")
contour3d(dens3d$d, level = levels, x = dens3d$x, y = dens3d$y, z = dens3d$z, #exp(-12)
alpha = .1, color = "red", color2 = "gray", add = TRUE)
title3d(xlab = "x", ylab = "y", zlab = "z")
这是制作的情节:
我想在 3d 核密度估计中绘制特定 %-contour 的等值面。然后,我想知道哪些点在那个 3d 形状内。
我将展示我接近 2d 的情况来说明我的问题(代码模仿
library(MASS)
library(misc3d)
library(rgl)
library(sp)
# Create dataset
set.seed(42)
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(400, c(100, 100, 100),Sigma))
### 2d ###
# Create kernel density
dens2d <- kde2d(mv[, 1], mv[, 2], n = 40)
# Find the contour level defined in prob
dx <- diff(dens2d$x[1:2])
dy <- diff(dens2d$y[1:2])
sd <- sort(dens2d$z)
c1 <- cumsum(sd) * dx * dy
prob <- .5
levels <- sapply(prob, function(x) {
approx(c1, sd, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
ls <- contourLines(dens2d, level = levels)
pinp <- point.in.polygon(mv[, 1], mv[, 2], ls[[1]]$x, ls[[1]]$y)
# Plot it
plot(mv[, 1], mv[, 2], pch = 21, bg = "gray")
contour(dens2d, levels = levels, labels = prob,
add = T, col = "red")
points(mv[pinp == 1, 1], mv[pinp == 1, 2], pch = 21, bg = "orange")
我想做同样的事情,但在 3d 情况下。这是我所管理的:
### 3d ###
# Create kernel density
dens3d <- kde3d(mv[,1], mv[,2], mv[,3], n = 40)
# Find the contour level defined in prob
dx <- diff(dens3d$x[1:2])
dy <- diff(dens3d$y[1:2])
dz <- diff(dens3d$z[1:2])
sd3d <- sort(dens3d$d)
c3d <- cumsum(sd3d) * dx * dy * dz
levels <- sapply(prob, function(x) {
approx(c3d, sd3d, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
# # No idea
# Plot it
points3d(mv[,1], mv[,2], mv[,3], size = 2)
box3d(col = "gray")
contour3d(dens3d$d, level = levels, x = dens3d$x, y = dens3d$y, z = dens3d$z, #exp(-12)
alpha = .3, color = "red", color2 = "gray", add = TRUE)
title3d(xlab = "x", ylab = "y", zlab = "z")
所以,我还没走多远。
我意识到我在 3d 情况下定义关卡的方式不正确,我猜问题出在 c3d <- cumsum(sd3d) * dx * dy * dz
但老实说我不知道如何继续。
并且,一旦正确定义了 3d 轮廓,我将不胜感激任何关于如何处理轮廓内的点的提示。
非常感谢!
编辑:根据 user2554330 的建议,我将编辑我的问题以添加测试代码,将他或她的建议与我在此处发布的建议进行比较。 (我确实意识到使用轮廓作为新数据点的推断的目的不在原始问题中,我对此修正表示歉意。)
此外,我在下面的评论中有些仓促。这两种方法在 2D 情况下的表现如何取决于样本有多大。在样本 n = 48 左右时,user2554330 的方法捕获了大约 69% 的人口(而我发布的方法捕获了大约 79%),但是在样本 n = 400 左右时,user2554330 的方法捕获了大约 79%(对比 83% ).
# Load libraries
library(MASS)
library(misc3d)
library(rgl)
library(sp)
library(oce)
library(akima)
# Create dataset
set.seed(42)
tn <- 1000 # number in pop
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(tn, c(100, 100, 100),Sigma)) # population
prob <- .8 # rather than .5
simn <- 100 # number of simulations
pinp <- rep(NA, simn)
cuts <- pinp
sn <- 48 # sample size, at n = 400 user2554330 performs better
### 2d scenario
for (isim in 1:simn) {
# Sample
smv <- mv[sample(1:tn, sn), ]
# Create kernel density
dens2d <- kde2d(smv[, 1], smv[, 2], n = 40,
lims = c(min(smv[, 1]) - abs(max(smv[, 1]) - min(smv[, 1])) / 2,
max(smv[, 1]) + abs(max(smv[, 1]) - min(smv[, 1])) / 2,
min(smv[, 2]) - abs(max(smv[, 2]) - min(smv[, 2])) / 2,
max(smv[, 2]) + abs(max(smv[, 2]) - min(smv[, 2])) / 2))
# Approach based on
# Find the contour level defined in prob
dx <- diff(dens2d$x[1:2])
dy <- diff(dens2d$y[1:2])
sd <- sort(dens2d$z)
c1 <- cumsum(sd) * dx * dy
levels <- sapply(prob, function(x) {
approx(c1, sd, xout = 1 - x)$y
})
# Find which values are inside the defined polygon
ls <- contourLines(dens2d, level = levels)
# Note below that I check points from "population"
pinp[isim] <- sum(point.in.polygon(mv[, 1], mv[, 2], ls[[1]]$x, ls[[1]]$y)) / tn
# Approach based on user2554330
# Find the estimated density at each observed point
sdatadensity<- bilinear(dens2d$x, dens2d$y, dens2d$z,
smv[,1], smv[,2])$z
# Find the contours
levels2 <- quantile(sdatadensity, probs = 1- prob, na.rm = TRUE)
# Find within
# Note below that I check points from "population"
datadensity <- bilinear(dens2d$x, dens2d$y, dens2d$z,
mv[,1], mv[,2])$z
cuts[isim] <- sum(as.numeric(cut(datadensity, c(0, levels2, Inf))) == 2, na.rm = T) / tn
}
summary(pinp)
summary(cuts)
> summary(pinp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0030 0.7800 0.8205 0.7950 0.8565 0.9140
> summary(cuts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5350 0.6560 0.6940 0.6914 0.7365 0.8120
我还尝试使用以下代码查看 user2554330 的方法在 3D 情况下的表现:
# 3d scenario
for (isim in 1:simn) {
# Sample
smv <- mv[sample(1:tn, sn), ]
# Create kernel density
dens3d <- kde3d(smv[,1], smv[,2], smv[,3], n = 40,
lims = c(min(smv[, 1]) - abs(max(smv[, 1]) - min(smv[, 1])) / 2,
max(smv[, 1]) + abs(max(smv[, 1]) - min(smv[, 1])) / 2,
min(smv[, 2]) - abs(max(smv[, 2]) - min(smv[, 2])) / 2,
max(smv[, 2]) + abs(max(smv[, 2]) - min(smv[, 2])) / 2,
min(smv[, 3]) - abs(max(smv[, 3]) - min(smv[, 3])) / 2,
max(smv[, 3]) + abs(max(smv[, 3]) - min(smv[, 3])) / 2))
# Approach based on user2554330
# Find the estimated density at each observed point
sdatadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
smv[,1], smv[,2], smv[,3])
# Find the contours
levels <- quantile(sdatadensity, probs = 1 - prob, na.rm = TRUE)
# Find within
# Note below that I check points from "population"
datadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
mv[,1], mv[,2], mv[,3])
cuts[isim] <- sum(as.numeric(cut(datadensity, c(0, levels, Inf))) == 2, na.rm = T) / tn
}
summary(cuts)
> summary(cuts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1220 0.1935 0.2285 0.2304 0.2620 0.3410
我更愿意定义轮廓,即使样本 n 相对较小(即 < 50),指定的概率也是(接近)捕获从同一人群中提取的未来数据点的概率。
与其试图找出哪些点在等高线内,不如尝试评估每个点的密度,并根据该值与等高线水平的比较来为这些点着色。对于边界附近的几个点可能会做出不同的决定,但应该非常接近。
要进行该评估,您可以对密度估计使用 oce::approx3d
函数。
我要做的另一件事是根据观察到的密度的分位数选择轮廓,而不是尝试模拟估计密度的 3-d 积分。
这是执行所有这些操作的代码:
library(MASS)
library(misc3d)
library(rgl)
library(oce)
#> Loading required package: testthat
#> Loading required package: gsw
# Create dataset
set.seed(42)
Sigma <- matrix(c(15, 8, 5, 8, 15, .2, 5, .2, 15), 3, 3)
mv <- data.frame(mvrnorm(400, c(100, 100, 100),Sigma))
### 3d ###
# Create kernel density
dens3d <- kde3d(mv[,1], mv[,2], mv[,3], n = 40)
# Find the estimated density at each observed point
datadensity <- approx3d(dens3d$x, dens3d$y, dens3d$z, dens3d$d,
mv[,1], mv[,2], mv[,3])
# Find the contours
prob <- .5
levels <- quantile(datadensity, probs = prob, na.rm = TRUE)
# Plot it
colours <- c("gray", "orange")
cuts <- cut(datadensity, c(0, levels, Inf))
for (i in seq_along(levels(cuts))) {
gp <- as.numeric(cuts) == i
spheres3d(mv[gp,1], mv[gp,2], mv[gp,3], col = colours[i], radius = 0.2)
}
box3d(col = "gray")
contour3d(dens3d$d, level = levels, x = dens3d$x, y = dens3d$y, z = dens3d$z, #exp(-12)
alpha = .1, color = "red", color2 = "gray", add = TRUE)
title3d(xlab = "x", ylab = "y", zlab = "z")
这是制作的情节: