求解包含非参数密度和分布的积分的最佳方法

Best way to solve an integral including a nonparametric density and distribution

假设我要求解一个包含两个积分的函数像(这是个例子,实际函数比较丑)

其中a和b是边界,c和d是已知参数,f(x)和F(x)是随机变量x的密度和分布。在我的问题中,f(x) 和 F(x) 是非参数化的,所以我只知道它们对于某些特定 x 值的值。你会如何设置积分?

我做到了:

# Create the data
val <- runif(300, min=1, max = 10) #use the uniform distribution
CDF <- (val - 1)/(10 - 1)
pdf <- 1 / (10 - 1)
data <- data.frame(val = val, CDF = CDF, pdf = pdf)

c = 2
d = 1

# Inner integral
integrand1 <- function(x) {
  i <- which.min(abs(x - data$val))
  FF <- data$CDF[i]
  ff <- data$pdf[i]
  (1 - FF)^(c/d) * ff
}

# Vectorize the inner integral
Integrand1 <- Vectorize(integrand1)

# Outer integral
integrand2 <- function(x){
  i <- which.min(abs(x - data$val))
  FF <- data$CDF[i]
  ff <- data$pdf[i]
  (quadgk(Integrand1, x, 10) / FF) * c * ff
}

# Vectorize the outer integral
Integrand2 <- Vectorize(integrand2)

# Solve
require(pracma)
quadgk(Integrand2, 1, 10)

积分速度极慢。有没有更好的方法来解决这个问题?谢谢。

------------编辑---------

在我的问题中,pdf 和 CDF 是从值向量 v 计算得出的,如下所示:

# Create the original data
v <- runif(300, min = 1, max = 10)
require(np)

# Compute the CDF and pdf
v.CDF.bw <- npudistbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
v.pdf.bw <- npudensbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")

# Extend v on a grid (I add this step because the v vector in my data
# is not very large. In this way I approximate the estimated pdf and CDF
# on a grid)
val <- seq(from = min(v), to = max(v), length.out = 1000)
data <- data.frame(val)
CDF <- npudist(bws = v.CDF.bw, newdata = data$val, edat = data )
pdf <- npudens(bws = v.pdf.bw, newdata = data$val, edat = data )
data$CDF <- CDF$dist
data$pdf <- pdf$dens

您是否考虑过使用 approxfun

它接受向量 x 和 y,并为您提供一个在它们之间进行线性插值的函数。例如,尝试

x <- runif(1000)+runif(1000)+2*(runif(1000)^2)
dx <- density(x)
fa <- approxfun(dx$x,dx$y)
curve(fa,0,2)
fa(0.4) 

您应该可以使用您的网格化评估来调用它。它可能比您正在做的更快(也更准确)

(编辑:是的,正如您所说,splinefun 如果它足够快以满足您的需求应该没问题)