概率向量上的网格
A grid over probability vectors
我正在尝试获得一个 "grid" 的 n 维概率向量——其中每个条目都在 0 和 1 之间,并且所有条目加起来为 1 的向量。我希望获得所有可能的向量,其中的坐标可以采用 0 到 1 之间的 v 个均匀 spaced 值中的任意一个。
为了说明这一点,下面是一个非常低效的实现,对于 n = 3 和 v = 3:
from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]
现在 grid
包含 [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), (0.5, 0.5, 0), (1, 0, 0)]
。
这 "implementation" 对于更高维度和更细粒度的网格来说非常糟糕。有没有好的方法可以做到这一点,也许使用 numpy
?
也许我可以补充一点动机:如果只是从随机分布中抽样给我足够的极端点,我会非常高兴,但事实并非如此。参见 this question。我追求的"grid"不是随机的,而是系统地扫单纯形(概率向量的space。)
这是一个递归的解决方案。它不使用 NumPy,也不是超级高效,尽管它应该比发布的代码片段更快:
import math
from itertools import permutations
def probability_grid(values, n):
values = set(values)
# Check if we can extend the probability distribution with zeros
with_zero = 0. in values
values.discard(0.)
if not values:
raise StopIteration
values = list(values)
for p in _probability_grid_rec(values, n, [], 0.):
if with_zero:
# Add necessary zeros
p += (0.,) * (n - len(p))
if len(p) == n:
yield from set(permutations(p)) # faster: more_itertools.distinct_permutations(p)
def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
if not values or n <= 0:
if abs(current_sum - 1.) <= eps:
yield tuple(current)
else:
value, *values = values
inv = 1. / value
# Skip this value
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Add copies of this value
precision = round(-math.log10(eps))
adds = int(round((1. - current_sum) / value, precision))
for i in range(adds):
current.append(value)
current_sum += value
n -= 1
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Remove copies of this value
if adds > 0:
del current[-adds:]
print(list(probability_grid([0, 0.5, 1.], 3)))
输出:
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]
与已发布方法的快速比较:
from itertools import product
def probability_grid_basic(values, n):
grid_redundant = product(values, repeat=n)
return [point for point in grid_redundant if sum(point)==1]
values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
对于高维向量,即使在接受的答案中使用巧妙的解决方案,完全通用地执行此操作也是相当难以管理的。在我自己的例子中,计算所有值的相关子集是值得的。例如,以下函数计算仅具有 n
个非零等概率条目的所有 dimension
维概率向量:
import itertools as it
import numpy as np
def equip_n(dimension, n):
"""
Calculate all possible <dimension>-dimensional probability vectors with n nonzero,
equiprobable entries
"""
combinations = np.array([comb for comb in it.combinations(range(dimension), n)])
vectors = np.zeros((combinations.shape[0], dimension))
for line, comb in zip(vectors, combinations):
line[comb] = 1/n
return vectors
print(equip_n(6, 3))
这个returns
[[ 0.3333 0.3333 0.3333 0. 0. 0. ]
[ 0.3333 0.3333 0. 0.3333 0. 0. ]
[ 0.3333 0.3333 0. 0. 0.3333 0. ]
[ 0.3333 0.3333 0. 0. 0. 0.3333]
[ 0.3333 0. 0.3333 0.3333 0. 0. ]
[ 0.3333 0. 0.3333 0. 0.3333 0. ]
[ 0.3333 0. 0.3333 0. 0. 0.3333]
[ 0.3333 0. 0. 0.3333 0.3333 0. ]
[ 0.3333 0. 0. 0.3333 0. 0.3333]
[ 0.3333 0. 0. 0. 0.3333 0.3333]
[ 0. 0.3333 0.3333 0.3333 0. 0. ]
[ 0. 0.3333 0.3333 0. 0.3333 0. ]
[ 0. 0.3333 0.3333 0. 0. 0.3333]
[ 0. 0.3333 0. 0.3333 0.3333 0. ]
[ 0. 0.3333 0. 0.3333 0. 0.3333]
[ 0. 0.3333 0. 0. 0.3333 0.3333]
[ 0. 0. 0.3333 0.3333 0.3333 0. ]
[ 0. 0. 0.3333 0.3333 0. 0.3333]
[ 0. 0. 0.3333 0. 0.3333 0.3333]
[ 0. 0. 0. 0.3333 0.3333 0.3333]]
这非常快。 %timeit equip_n(6, 3)
returns
15.1 µs ± 74.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
我正在尝试获得一个 "grid" 的 n 维概率向量——其中每个条目都在 0 和 1 之间,并且所有条目加起来为 1 的向量。我希望获得所有可能的向量,其中的坐标可以采用 0 到 1 之间的 v 个均匀 spaced 值中的任意一个。
为了说明这一点,下面是一个非常低效的实现,对于 n = 3 和 v = 3:
from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]
现在 grid
包含 [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), (0.5, 0.5, 0), (1, 0, 0)]
。
这 "implementation" 对于更高维度和更细粒度的网格来说非常糟糕。有没有好的方法可以做到这一点,也许使用 numpy
?
也许我可以补充一点动机:如果只是从随机分布中抽样给我足够的极端点,我会非常高兴,但事实并非如此。参见 this question。我追求的"grid"不是随机的,而是系统地扫单纯形(概率向量的space。)
这是一个递归的解决方案。它不使用 NumPy,也不是超级高效,尽管它应该比发布的代码片段更快:
import math
from itertools import permutations
def probability_grid(values, n):
values = set(values)
# Check if we can extend the probability distribution with zeros
with_zero = 0. in values
values.discard(0.)
if not values:
raise StopIteration
values = list(values)
for p in _probability_grid_rec(values, n, [], 0.):
if with_zero:
# Add necessary zeros
p += (0.,) * (n - len(p))
if len(p) == n:
yield from set(permutations(p)) # faster: more_itertools.distinct_permutations(p)
def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
if not values or n <= 0:
if abs(current_sum - 1.) <= eps:
yield tuple(current)
else:
value, *values = values
inv = 1. / value
# Skip this value
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Add copies of this value
precision = round(-math.log10(eps))
adds = int(round((1. - current_sum) / value, precision))
for i in range(adds):
current.append(value)
current_sum += value
n -= 1
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Remove copies of this value
if adds > 0:
del current[-adds:]
print(list(probability_grid([0, 0.5, 1.], 3)))
输出:
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]
与已发布方法的快速比较:
from itertools import product
def probability_grid_basic(values, n):
grid_redundant = product(values, repeat=n)
return [point for point in grid_redundant if sum(point)==1]
values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
对于高维向量,即使在接受的答案中使用巧妙的解决方案,完全通用地执行此操作也是相当难以管理的。在我自己的例子中,计算所有值的相关子集是值得的。例如,以下函数计算仅具有 n
个非零等概率条目的所有 dimension
维概率向量:
import itertools as it
import numpy as np
def equip_n(dimension, n):
"""
Calculate all possible <dimension>-dimensional probability vectors with n nonzero,
equiprobable entries
"""
combinations = np.array([comb for comb in it.combinations(range(dimension), n)])
vectors = np.zeros((combinations.shape[0], dimension))
for line, comb in zip(vectors, combinations):
line[comb] = 1/n
return vectors
print(equip_n(6, 3))
这个returns
[[ 0.3333 0.3333 0.3333 0. 0. 0. ]
[ 0.3333 0.3333 0. 0.3333 0. 0. ]
[ 0.3333 0.3333 0. 0. 0.3333 0. ]
[ 0.3333 0.3333 0. 0. 0. 0.3333]
[ 0.3333 0. 0.3333 0.3333 0. 0. ]
[ 0.3333 0. 0.3333 0. 0.3333 0. ]
[ 0.3333 0. 0.3333 0. 0. 0.3333]
[ 0.3333 0. 0. 0.3333 0.3333 0. ]
[ 0.3333 0. 0. 0.3333 0. 0.3333]
[ 0.3333 0. 0. 0. 0.3333 0.3333]
[ 0. 0.3333 0.3333 0.3333 0. 0. ]
[ 0. 0.3333 0.3333 0. 0.3333 0. ]
[ 0. 0.3333 0.3333 0. 0. 0.3333]
[ 0. 0.3333 0. 0.3333 0.3333 0. ]
[ 0. 0.3333 0. 0.3333 0. 0.3333]
[ 0. 0.3333 0. 0. 0.3333 0.3333]
[ 0. 0. 0.3333 0.3333 0.3333 0. ]
[ 0. 0. 0.3333 0.3333 0. 0.3333]
[ 0. 0. 0.3333 0. 0.3333 0.3333]
[ 0. 0. 0. 0.3333 0.3333 0.3333]]
这非常快。 %timeit equip_n(6, 3)
returns
15.1 µs ± 74.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)