从计数向量中随机选取元素

Question

我目前正在尝试通过更改算法来优化一些 MATLAB/Octave 代码，但无法弄清楚如何处理这里的一些随机性。假设我有一个整数向量 V，每个元素代表一些事物的数量，在我的例子中是光子。现在我想随机选择一些 "things" 并创建一个相同大小的新向量，但调整了计数。

目前我是这样做的：

function W = photonfilter(V, eff)
% W = photonfilter(V, eff)
% Randomly takes photons from V according to the given efficiency.
%
% Args:
%  V: Input vector containing the number of emitted photons in each
%     timeslot (one element is one timeslot). The elements are rounded
%     to integers before processing.
%  eff: Filter efficiency. On the average, every 1/eff photon will be
%       taken. This value must be in the range 0 < eff <= 1.
%  W: Output row vector with the same length as V and containing the number
%     of received photons in each timeslot.
%
% WARNING: This function operates on a photon-by-photon basis in that it
% constructs a vector with one element per photon. The storage requirements
% therefore directly depend on sum(V), not only on the length of V.

% Round V and make it flat.
Ntot = length(V);
V = round(V);
V = V(:);

% Initialize the photon-based vector, so that each element contains
% the original index of the photon.
idxV = zeros(1, sum(V), 'uint32');
iout = 1;
for i = 1:Ntot
  N = V(i);
  idxV(iout:iout+N-1) = i;
  iout = iout + N;
end;

% Take random photons.
idxV = idxV(randperm(length(idxV)));
idxV = idxV(1:round(length(idxV)*eff));

% Generate the output vector by placing the remaining photons back
% into their timeslots.
[W, trash] = hist(idxV, 1:Ntot);

这是上述描述的一个相当简单的实现。但它有一个明显的性能缺陷：该函数创建一个向量 (idxV)，每个光子包含一个元素。因此，如果我的 V 只有 1000 个元素，但平均每个元素有 10000 个元素，则内部向量将有 1000 万个元素，使函数变得缓慢而沉重。

我现在想要实现的不是直接优化这段代码，而是使用一些其他类型的算法来立即计算新的计数，而不需要给每个光子某种 "identity"。这一定是可行的，但我就是不知道该怎么做。

要求：

输出向量 W 必须与输入向量 V 具有相同数量的元素。
W(i) 必须是一个整数并以 0 <= W(i) <= V(i) 为界。
sum(W)的期望值必须是sum(V)*eff.
算法必须以某种方式实现这个 "random picking" 光子，即不应该有像 "run through V dividing all counts by the stepsize and propagating the remainders" 这样的确定性部分，因为这个函数的全部意义在于给系统带来随机性。
如果不可避免，允许对 V 进行显式循环，但最好使用矢量化方法。

有什么想法可以实现这样的东西吗？仅使用随机向量然后使用一些概率和舍入技巧的解决方案将是理想的，但到目前为止我还没有取得任何成功。

谢谢！最好的问候，菲利普

Answer 1

您用来计算 W 的方法称为 Monte Carlo method。确实可以进行一些优化。其中一次不是计算光子指数，而是让我们想象一组箱子。每个 bin 都有一定的概率，所有 bin 的概率之和加起来为 1。我们将段 [0, 1] 分成长度与 bin 的概率成正比的部分。现在对于我们生成的 [0, 1) 内的每个随机数，我们可以快速找到它所属的 bin。最后，我们对 bins 中的数字进行计数以获得最终结果。下面的代码说明了这个想法。

% Population size (number of photons).
N = 1000000;
% Sample size, size of V and W as well.
% For convenience of plotting, V and W are of the same size, but
% the algorithm doesn't enforce this constraint.
M = 10000;
% Number of Monte Carlo iterations, greater numbers give better quality.
K = 100000;

% Generate population of counts, use gaussian distribution to test the method.
% If implemented correctly histograms should have the same shape eventually.
V = hist(randn(1, N), M);
P = cumsum(V / sum(V));
% For every generated random value find its bin and then count the bins.
% Finally we normalize counts by the ration of N / K.
W = hist(lookup(P, rand(1, K)), M) * N / K;
% Compare distribution plots, they should be the same.
hold on;
plot(W, '+r');
plot(V, '*b');
pause

Answer 2

根据 Alexander Solovets 的回答，代码现在是这样的：

function W = photonfilter(V, eff, impl=1)

Ntot = length(V);
V = V(:);

if impl == 0
  % Original "straightforward" solution.
  V = round(V);
  idxV = zeros(1, sum(V), 'uint32');
  iout = 1;
  for i = 1:Ntot
    N = V(i);
    idxV(iout:iout+N-1) = i;
    iout = iout + N;
  end;
  idxV = idxV(randperm(length(idxV)));
  idxV = idxV(1:round(length(idxV)*eff));
  [W, trash] = hist(idxV, 1:Ntot);

else
  % Monte Carlo approach.
  Nphot = sum(V);
  P = cumsum(V / Nphot);
  W = hist(lookup(P, rand(1, round(Nphot * eff))), 0:Ntot-1);

end;

只要 eff 不是太接近 1（eff=1，原解得到 W=V，而 Monte Carlo 方法仍然有一定的随机性，因此违反了上限约束）。

在交互式 Octave 中测试 shell:

octave:1> T=linspace(0,10*pi,10000);
octave:2> V=100*(1+sin(T));
octave:3> W1=photonfilter(V, 0.1, 0);
octave:4> W2=photonfilter(V, 0.1, 1);
octave:5> plot(T,V,T,W1,T,W2);
octave:6> legend('V','Random picking','Monte Carlo')
octave:7> sum(W1)
ans =  100000
octave:8> sum(W2)
ans =  100000

剧情：

从计数向量中随机选取元素

Randomly pick elements from a vector of counts

algorithm

statistics

matlab

octave