从均匀分布的混合中抽取随机数
Draw random numbers from the mixture of uniform distributions
目标
我正在尝试构建一个函数,从 "incomplete uniform distribution".
中提取特定数量的随机数
什么叫不完全均匀分布?
我将不完全均匀分布称为概率分布,其中一系列边界内的每个 X
值都有相等的概率被选取。换句话说,它是一个有空洞的均匀分布(概率为零),如下所示
x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
points(xi,rep(y, length(xi)), type="l", lwd=4)
}
丑解
这是一个缓慢而丑陋的解决方案
IncompleteUnif = function(n,b)
{
#################
# "n" is the desired number of random numbers
# "b" is a list describing the boundaries within which a random number can possibly be drawn.
#################
r = c() # Series of random numbers to return
for (ni in n)
{
while (length(r) < n) # loop will continue until we have the "n" random numbers we need
{
ub = unlist(b)
x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
{
if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
{
r = append(r,x)
break
}
}
}
}
return (r)
}
b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1] 28.929516 287.132444 330.204498 63.425103 16.693990 66.680826 226.374551 12.892821 7.872065 140.480533
您的不完全均匀分布可以表示为四个普通均匀分布的混合,每个段的混合权重与其长度成正比(即,段越长,它的权重越大)。
要从这样的分布中抽样,首先选择一个细分(考虑权重)。然后从所选段中选择一个元素。
我相信这可行,使用 Robert Dodier 建议的算法:
rmixunif = function(n, b) {
subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
subdists_n = tabulate(subdists)
draw = numeric(n)
for (i in unique(subdists)) {
draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
}
return(draw)
}
rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
# [1] 64.85989 85.33292 235.39607 233.40133 240.28686 67.21626 237.60248 11.80377 151.65365 306.44473
我喜欢 Sam Dickson 的直方图视觉检查,这是我的版本:
x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
它可以通过一些输入检查来想象(也许 mapply
就像评论中建议的那样),但我会把它留给其他人。
感谢alexis_iaz的tabulate()
建议!
另一个解决方案是转换输出。这个想法是从随机均匀分布中抽样,然后应用条件转换,使数字只落在选定的范围内:
IncompleteUnif = function(n,b) {
widths <- cumsum(sapply(b,diff))
x <- runif(n,0,tail(widths,1))
out <- x
out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
for(i in 2:length(b)) {
out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
}
return(out)
}
x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
@Gregor 解决方案的稍微复杂的版本。
mix_unif <- function(n, b){
x <- c()
ns <- rmultinom(1, n, sapply(b, diff))
for (i in seq_along(ns)) {
x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
}
x
}
microbenchmark(mix_unif(1e5, b),
rmixunif(1e5, b),
IncompleteUnif(1e5, b),
unit="relative")
Unit: relative
expr min lq mean median uq max neval
mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 100
rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449 100
IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041 100
我迟到了几年,但看到没有没有显式循环的解决方案,这是一个这样的实现(遵循@RobertDodier 的方法):
rmunif <- function(n, b) {
runifb <- function(n, b) runif(n, b[1], b[2])
ns <- rmultinom(1, n, vapply(b, diff, 1))
unlist(Map(runifb, ns, b), use.names = FALSE)
}
hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))
library(microbenchmark)
set.seed(2018)
n <- 1e5
microbenchmark(
rmunif(n, b),
mix_unif(n, b),
rmixunif(n, b),
IncompleteUnif(n, b),
unit = "relative"
) -> mb
print(mb, signif = 5)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100
#> mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476 100
#> rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970 100
#> IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364 100
由 reprex package (v0.2.0) 创建于 2018-03-11。
目标
我正在尝试构建一个函数,从 "incomplete uniform distribution".
中提取特定数量的随机数什么叫不完全均匀分布?
我将不完全均匀分布称为概率分布,其中一系列边界内的每个 X
值都有相等的概率被选取。换句话说,它是一个有空洞的均匀分布(概率为零),如下所示
x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
points(xi,rep(y, length(xi)), type="l", lwd=4)
}
丑解
这是一个缓慢而丑陋的解决方案
IncompleteUnif = function(n,b)
{
#################
# "n" is the desired number of random numbers
# "b" is a list describing the boundaries within which a random number can possibly be drawn.
#################
r = c() # Series of random numbers to return
for (ni in n)
{
while (length(r) < n) # loop will continue until we have the "n" random numbers we need
{
ub = unlist(b)
x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
{
if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
{
r = append(r,x)
break
}
}
}
}
return (r)
}
b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1] 28.929516 287.132444 330.204498 63.425103 16.693990 66.680826 226.374551 12.892821 7.872065 140.480533
您的不完全均匀分布可以表示为四个普通均匀分布的混合,每个段的混合权重与其长度成正比(即,段越长,它的权重越大)。
要从这样的分布中抽样,首先选择一个细分(考虑权重)。然后从所选段中选择一个元素。
我相信这可行,使用 Robert Dodier 建议的算法:
rmixunif = function(n, b) {
subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
subdists_n = tabulate(subdists)
draw = numeric(n)
for (i in unique(subdists)) {
draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
}
return(draw)
}
rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
# [1] 64.85989 85.33292 235.39607 233.40133 240.28686 67.21626 237.60248 11.80377 151.65365 306.44473
我喜欢 Sam Dickson 的直方图视觉检查,这是我的版本:
x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
它可以通过一些输入检查来想象(也许 mapply
就像评论中建议的那样),但我会把它留给其他人。
感谢alexis_iaz的tabulate()
建议!
另一个解决方案是转换输出。这个想法是从随机均匀分布中抽样,然后应用条件转换,使数字只落在选定的范围内:
IncompleteUnif = function(n,b) {
widths <- cumsum(sapply(b,diff))
x <- runif(n,0,tail(widths,1))
out <- x
out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
for(i in 2:length(b)) {
out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
}
return(out)
}
x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
@Gregor 解决方案的稍微复杂的版本。
mix_unif <- function(n, b){
x <- c()
ns <- rmultinom(1, n, sapply(b, diff))
for (i in seq_along(ns)) {
x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
}
x
}
microbenchmark(mix_unif(1e5, b),
rmixunif(1e5, b),
IncompleteUnif(1e5, b),
unit="relative")
Unit: relative
expr min lq mean median uq max neval
mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 100
rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449 100
IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041 100
我迟到了几年,但看到没有没有显式循环的解决方案,这是一个这样的实现(遵循@RobertDodier 的方法):
rmunif <- function(n, b) {
runifb <- function(n, b) runif(n, b[1], b[2])
ns <- rmultinom(1, n, vapply(b, diff, 1))
unlist(Map(runifb, ns, b), use.names = FALSE)
}
hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))
library(microbenchmark)
set.seed(2018)
n <- 1e5
microbenchmark(
rmunif(n, b),
mix_unif(n, b),
rmixunif(n, b),
IncompleteUnif(n, b),
unit = "relative"
) -> mb
print(mb, signif = 5)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100
#> mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476 100
#> rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970 100
#> IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364 100
由 reprex package (v0.2.0) 创建于 2018-03-11。