关于 Monte Carlo 概率语法

Question

让 20 个人，包括恰好 3 名女性，随机坐在 4 table 秒（表示为 (A,B,C,D)），每人 5 人，所有安排的可能性相同。令 X 为 table 没有女性坐下的数量。编写一个 numpy Monte Carlo 模拟来估计 X 的期望，并估计没有女性坐在 table A 的概率 p。运行模拟3 个案例 (100,1000,10000)

我想定义一个函数，它利用 numpy 的 random.permutation 函数来计算 X 的预期值，给定可变次数的试验我知道如何在笔和纸上执行此操作，遍历我的 collection 的概率并将它们相乘，这样我就可以计算出事件的总概率。这是我目前所拥有的

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M =100      #number of trials

collection = []

for i in range(K):


    x = (((N-W)-i)/(N-i))

    collection.append(x)

如果我检查我的 collection，这是我的输出：[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125]

Answer 1

你必须明确地模拟坐姿吗？如果没有，那么就简单地随机抽取3次，从1..4开始替换，模拟一次坐着，即：

def one_experiment():
    return set(random.randint(1, 4) for _ in range(3))  # Distinct tables with women.

然后按如下方式获得所需的值，其中 N 是任何情况下的实验次数。

expectation_of_X = sum(4 - len(one_experiment()) for _ in range(N)) / float(N)
probability_no_women_table_1 = sum(1 not in one_experiment() for _ in range(N)) / float(N)

对于大 N，您得到的值应该大约为 p = (3 / 4)^3 和 E[X] = (3^3) / (4^2)。

Answer 2

您可以使用 functools.reduce 在 Python 3.x.

中乘以集合中的项目

from functools import reduce
event_probability = reduce(lambda x, y: x*y, collection)

所以在你的代码中：

from functools import reduce

T = 4       # number of tables
N = 20      # number of persons. Assumption: N is a multiple of T.
K = 5       # capacity per table
W = 3       # number of women. Assumption: first W of N persons are women.
M = 100      #number of trials

collection = []

for i in range(K):
    x = (((N-W)-i)/(N-i))
    collection.append(x)

event_probability = reduce(lambda x, y: x*y, collection)

print(collection)
print(event_probability)

输出：

[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] # collection
0.3991228070175438 # event_probability

然后您可以使用结果来完成您的代码。

Answer 3

实施

这是蒙特卡洛模拟的简单实现。它不是为高性能而设计的，而是允许您交叉检查设置并查看详细信息：

import collections
import numpy as np

def runMonteCarlo(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def countWomen(c, nt=4):
        """
        Count Number of Women per Table
        """
        x = np.array(c).reshape(nt, -1).T  # Split permutation into tables
        return np.sum(x, axis=0)           # Sum woman per table

    # Initialization:
    comp = np.array([1]*nw + [0]*(nh-nw)) # Composition: 1=woman, 0=man
    x = []                                # Counts of tables without any woman
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        c = np.random.permutation(comp)   # Random permutation, table composition
        w = countWomen(c, nt=nt)          # Count Woman per table
        nc = np.sum(w!=0)                 # Count how many tables with women 
        x.append(nt - nc)                 # Store count of tables without any woman
        p += int(w[0]==0)                 # Is table A empty?
        #if k % 100 == 0:
            #print(c, w, nc, nt-nc, p)

    # Rationalize (count->frequency)
    r = collections.Counter(x)
    r = {k:r.get(k, 0)/N for k in range(nt+1)}
    p /= N
    return r, p

执行作业：

for n in [100, 1000, 10000]:
    s = runMonteCarlo(N=n)
    E = sum([k*v for k,v in s[0].items()])
    print('N=%d, P(X=k) = %s, p=%s, E[X]=%s' % (n, *s, E))

Returns:

N=100, P(X=k) = {0: 0.0, 1: 0.43, 2: 0.54, 3: 0.03, 4: 0.0}, p=0.38, E[X]=1.6
N=1000, P(X=k) = {0: 0.0, 1: 0.428, 2: 0.543, 3: 0.029, 4: 0.0}, p=0.376, E[X]=1.601
N=10000, P(X=k) = {0: 0.0, 1: 0.442, 2: 0.5235, 3: 0.0345, 4: 0.0}, p=0.4011, E[X]=1.5924999999999998

绘制分布图，得出：

import pandas as pd
axe = pd.DataFrame.from_dict(s[0], orient='index').plot(kind='bar')
axe.set_title("Monte Carlo Simulation")
axe.set_xlabel('Random Variable, $X$')
axe.set_ylabel('Frequency, $F(X=k)$')
axe.grid()

与替代版本的分歧

Caution: this method does not answer the stated problem!

如果我们实现另一个版本的模拟，我们改变随机实验的执行方式如下：

import random
import collections

def runMonteCarlo2(nw=3, nh=20, nt=4, N=20):
    """
    Run Monte Carlo Simulation
    """

    def one_experiment(nt, nw):
        """
        Table setup (suggested by @Inon Peled)
        """
        return set(random.randint(0, nt-1) for _ in range(nw)) # Sample nw times from 0 <= k <= nt-1

    c = collections.Counter()             # Empty Table counter
    p = 0                                 # Probability of there is no woman at table A  

    for k in range(N):
        exp = one_experiment(nt, nw)      # Select table with at least one woman
        c.update([nt - len(exp)])         # Update Counter X distribution
        p += int(0 not in exp)            # There is no woman at table A (table 0)

    # Rationalize:
    r = {k:c.get(k, 0)/N for k in range(nt+1)}
    p /= N

    return r, p

它returns:

N=100, P(X=k) = {0: 0.0, 1: 0.41, 2: 0.51, 3: 0.08, 4: 0.0}, p=0.4, E[X]=1.67
N=1000, P(X=k) = {0: 0.0, 1: 0.366, 2: 0.577, 3: 0.057, 4: 0.0}, p=0.426, E[X]=1.691
N=1000000, P(X=k) = {0: 0.0, 1: 0.37462, 2: 0.562787, 3: 0.062593, 4: 0.0}, p=0.42231, E[X]=1.687973

第二个版本收敛到另一个值，显然不等同于第一个版本，它没有回答同一个问题。

讨论

为了区分哪种实现是正确的，我对这两种实现都有 computed sampled spaces and probabilities。似乎第一个版本是正确的，因为它考虑到女性坐在 table 的概率取决于之前被选中的人。第二个版本没有考虑它，这就是为什么它不需要知道有多少人以及每个 table.

可以容纳多少人。

这是一个很好的问题，因为两个答案都提供了接近的结果。这项工作的一个重要部分是妥善设置 Monte Carlo 输入。

关于 Monte Carlo 概率语法

On Monte Carlo Probability syntax

python

probability

montecarlo

实施

与替代版本的分歧

讨论