通过询问二进制问题模拟找到随机选择的数字

Simulating finding a randomly chosen number by asking binary questions

作为作业中的一个问题,我被要求编写一个 Octave 函数来模拟 1000 次寻找随机变量 X 和字母表 {0 的实验, 1, 2, 3} 和 pmf:

Px(0) = 1/8

Px(1) = 1/4

Px(2) = 1/2

Px(3) = 1/8

通过询问一系列二进制、"yes" 或 "no" 问题。

我已经确定要求找到 X 值的二进制问题的最佳序列是简单地询问 "Is X = p?" 其中 p 是可能的值,按概率递减的顺序排列。

所以最佳顺序是:

  1. X = 2吗?

    如果没有:

  2. X = 1吗?

    如果没有:

  3. X=0吗?

    如果不是那么X = 3

这是我写的函数:

function x = guessing_experiment(probabilities, n)
  % generates n simulations of finding a random number in an alphabet by asking binary questions,
  % where 'probabilities' is a list of the probabilities per number in the order the questions will be asked

  num_Qs = zeros(1,n);                            % allocate array of size n for number of questions asked per experiment
  [num_col, alphabet_size] = size(probabilities); % get size of alphabet

  for i = 1:n                                     % generate n experiments
    Qs = 0;                                       % number of questions asked in this experiment
    for j = 1:alphabet_size - 1                   % iterate through questions
      question = rand;                            % generate random number in range [0, 1]
      Qs++;                                       % incremenet number of questions asked
      if (question <= probabilities(j))           % if question produces a "yes" answer
        break;
      endif
    endfor
    num_Qs(i) = Qs;                               % store number of questions asked for this experiment
  endfor

  x = mean(num_Qs);                               % calculate mean number of questions asked over the n experiments 

 end

被称为guessing_experiment([1/2, 1/4, 1/8, 1/8], 1000) 其中数组是每个问题产生 "yes" 答案的概率,按照提问方式排列,n 是实验次数。

问这些问题应该产生平均 1.75 个问题,但我的程序总是产生 ~1.87 个平均问题。我的脚本哪里出错了?

我假设它与生成一个新的随机数来模拟被问到的 3 个问题中的每一个有关。

我删除了我之前的错误答案,该答案指出您的脚本是正确的,而您的计算是错误的。我再想想,你的计算是对的。我自己尝试使用以下 MATLAB 脚本:

% probabilities for each number
p = [1/8,1/4,1/2,1/8];
% sort them from higher to lower
p = sort(p,'descend');
% number of questions per probability
nq = 1:length(p)-1;
% the last question can distinguish between two variables
nq(end+1) = nq(end);
% number of trials
n = 100000;
% random sample number of questions
q = randsample(nq,n,true,p);
% mean number of questions
avgQ = mean(q)

和获得的平均值。是 1.75 - 正如您计算的那样。 我会尝试再看一下您的代码,看看有什么问题

编辑

您的脚本的问题是您忽略了 conditional probability,即,在询问有关变量的问题时,您忽略了您已经知道的有关它的信息。例如,当您问第三个问题时,值为 0 的概率是 而不是 p=1/8 而是 p=1/2 因为您已经知道它不是 12。 您需要做的修复是将概率除以可能的事件概率 probabilities(j)/sum(probabilities(j:end)):

n = 10000;
p = [1/8,1/4,1/2,1/8];
% sort them from higher to lower
probabilities = sort(p,'descend');
probabilities(end-1) = probabilities(end-1) + probabilities(end);
probabilities(end) = [];
alphabet_size = numel(probabilities);
num_Qs = zeros(1,n);                            % allocate array of size n for number of questions asked per experiment

for i = 1:n                                     % generate n experiments
    Qs = 0;                                       % number of questions asked in this experiment
    for j = 1:alphabet_size                   % iterate through questions
        question = rand;                            % generate random number in range [0, 1]
        Qs = Qs + 1;                                       % incremenet number of questions asked
        if question < probabilities(j)/sum(probabilities(j:end))           % if question produces a "yes" answer
            break;
        end
    end
    num_Qs(i) = Qs;                               % store number of questions asked for this experiment
end

x = mean(num_Qs)

x ~ 1.75

在这种情况下条件概率的向量是:

p = [1/8,1/4,1/2,1/8];
p = sort(p,'descend');
cond_p = p./cumsum(p,'reverse')

cond_p =

    0.5000    0.5000    0.5000    1.0000