关于 Monte Carlo 概率语法
On Monte Carlo Probability syntax
让 20 个人,包括恰好 3 名女性,随机坐在 4 table 秒(表示为 (A,B,C,D)),每人 5 人,所有安排的可能性相同。令 X 为 table 没有女性坐下的数量。编写一个 numpy Monte Carlo 模拟来估计 X 的期望,并估计没有女性坐在 table A 的概率 p。运行 模拟3 个案例 (100,1000,10000)
我想定义一个函数,它利用 numpy 的 random.permutation 函数来计算 X 的预期值,给定可变次数的试验我知道如何在笔和纸上执行此操作,遍历我的 collection 的概率并将它们相乘,这样我就可以计算出事件的总概率。这是我目前所拥有的
T = 4 # number of tables
N = 20 # number of persons. Assumption: N is a multiple of T.
K = 5 # capacity per table
W = 3 # number of women. Assumption: first W of N persons are women.
M =100 #number of trials
collection = []
for i in range(K):
x = (((N-W)-i)/(N-i))
collection.append(x)
如果我检查我的 collection,这是我的输出:[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125]
你必须明确地模拟坐姿吗?如果没有,那么就简单地随机抽取3次,从1..4开始替换,模拟一次坐着,即:
def one_experiment():
return set(random.randint(1, 4) for _ in range(3)) # Distinct tables with women.
然后按如下方式获得所需的值,其中 N 是任何情况下的实验次数。
expectation_of_X = sum(4 - len(one_experiment()) for _ in range(N)) / float(N)
probability_no_women_table_1 = sum(1 not in one_experiment() for _ in range(N)) / float(N)
对于大 N,您得到的值应该大约为 p = (3 / 4)^3 和 E[X] = (3^3) / (4^2)。
您可以使用 functools.reduce
在 Python 3.x.
中乘以集合中的项目
from functools import reduce
event_probability = reduce(lambda x, y: x*y, collection)
所以在你的代码中:
from functools import reduce
T = 4 # number of tables
N = 20 # number of persons. Assumption: N is a multiple of T.
K = 5 # capacity per table
W = 3 # number of women. Assumption: first W of N persons are women.
M = 100 #number of trials
collection = []
for i in range(K):
x = (((N-W)-i)/(N-i))
collection.append(x)
event_probability = reduce(lambda x, y: x*y, collection)
print(collection)
print(event_probability)
输出:
[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] # collection
0.3991228070175438 # event_probability
然后您可以使用结果来完成您的代码。
实施
这是蒙特卡洛模拟的简单实现。它不是为高性能而设计的,而是允许您交叉检查设置并查看详细信息:
import collections
import numpy as np
def runMonteCarlo(nw=3, nh=20, nt=4, N=20):
"""
Run Monte Carlo Simulation
"""
def countWomen(c, nt=4):
"""
Count Number of Women per Table
"""
x = np.array(c).reshape(nt, -1).T # Split permutation into tables
return np.sum(x, axis=0) # Sum woman per table
# Initialization:
comp = np.array([1]*nw + [0]*(nh-nw)) # Composition: 1=woman, 0=man
x = [] # Counts of tables without any woman
p = 0 # Probability of there is no woman at table A
for k in range(N):
c = np.random.permutation(comp) # Random permutation, table composition
w = countWomen(c, nt=nt) # Count Woman per table
nc = np.sum(w!=0) # Count how many tables with women
x.append(nt - nc) # Store count of tables without any woman
p += int(w[0]==0) # Is table A empty?
#if k % 100 == 0:
#print(c, w, nc, nt-nc, p)
# Rationalize (count->frequency)
r = collections.Counter(x)
r = {k:r.get(k, 0)/N for k in range(nt+1)}
p /= N
return r, p
执行作业:
for n in [100, 1000, 10000]:
s = runMonteCarlo(N=n)
E = sum([k*v for k,v in s[0].items()])
print('N=%d, P(X=k) = %s, p=%s, E[X]=%s' % (n, *s, E))
Returns:
N=100, P(X=k) = {0: 0.0, 1: 0.43, 2: 0.54, 3: 0.03, 4: 0.0}, p=0.38, E[X]=1.6
N=1000, P(X=k) = {0: 0.0, 1: 0.428, 2: 0.543, 3: 0.029, 4: 0.0}, p=0.376, E[X]=1.601
N=10000, P(X=k) = {0: 0.0, 1: 0.442, 2: 0.5235, 3: 0.0345, 4: 0.0}, p=0.4011, E[X]=1.5924999999999998
绘制分布图,得出:
import pandas as pd
axe = pd.DataFrame.from_dict(s[0], orient='index').plot(kind='bar')
axe.set_title("Monte Carlo Simulation")
axe.set_xlabel('Random Variable, $X$')
axe.set_ylabel('Frequency, $F(X=k)$')
axe.grid()
与替代版本的分歧
Caution: this method does not answer the stated problem!
如果我们实现另一个版本的模拟,我们改变随机实验的执行方式如下:
import random
import collections
def runMonteCarlo2(nw=3, nh=20, nt=4, N=20):
"""
Run Monte Carlo Simulation
"""
def one_experiment(nt, nw):
"""
Table setup (suggested by @Inon Peled)
"""
return set(random.randint(0, nt-1) for _ in range(nw)) # Sample nw times from 0 <= k <= nt-1
c = collections.Counter() # Empty Table counter
p = 0 # Probability of there is no woman at table A
for k in range(N):
exp = one_experiment(nt, nw) # Select table with at least one woman
c.update([nt - len(exp)]) # Update Counter X distribution
p += int(0 not in exp) # There is no woman at table A (table 0)
# Rationalize:
r = {k:c.get(k, 0)/N for k in range(nt+1)}
p /= N
return r, p
它returns:
N=100, P(X=k) = {0: 0.0, 1: 0.41, 2: 0.51, 3: 0.08, 4: 0.0}, p=0.4, E[X]=1.67
N=1000, P(X=k) = {0: 0.0, 1: 0.366, 2: 0.577, 3: 0.057, 4: 0.0}, p=0.426, E[X]=1.691
N=1000000, P(X=k) = {0: 0.0, 1: 0.37462, 2: 0.562787, 3: 0.062593, 4: 0.0}, p=0.42231, E[X]=1.687973
第二个版本收敛到另一个值,显然不等同于第一个版本,它没有回答同一个问题。
讨论
为了区分哪种实现是正确的,我对这两种实现都有 computed sampled spaces and probabilities。似乎第一个版本是正确的,因为它考虑到女性坐在 table 的概率取决于之前被选中的人。第二个版本没有考虑它,这就是为什么它不需要知道有多少人以及每个 table.
可以容纳多少人。
这是一个很好的问题,因为两个答案都提供了接近的结果。这项工作的一个重要部分是妥善设置 Monte Carlo 输入。
让 20 个人,包括恰好 3 名女性,随机坐在 4 table 秒(表示为 (A,B,C,D)),每人 5 人,所有安排的可能性相同。令 X 为 table 没有女性坐下的数量。编写一个 numpy Monte Carlo 模拟来估计 X 的期望,并估计没有女性坐在 table A 的概率 p。运行 模拟3 个案例 (100,1000,10000)
我想定义一个函数,它利用 numpy 的 random.permutation 函数来计算 X 的预期值,给定可变次数的试验我知道如何在笔和纸上执行此操作,遍历我的 collection 的概率并将它们相乘,这样我就可以计算出事件的总概率。这是我目前所拥有的
T = 4 # number of tables
N = 20 # number of persons. Assumption: N is a multiple of T.
K = 5 # capacity per table
W = 3 # number of women. Assumption: first W of N persons are women.
M =100 #number of trials
collection = []
for i in range(K):
x = (((N-W)-i)/(N-i))
collection.append(x)
如果我检查我的 collection,这是我的输出:[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125]
你必须明确地模拟坐姿吗?如果没有,那么就简单地随机抽取3次,从1..4开始替换,模拟一次坐着,即:
def one_experiment():
return set(random.randint(1, 4) for _ in range(3)) # Distinct tables with women.
然后按如下方式获得所需的值,其中 N 是任何情况下的实验次数。
expectation_of_X = sum(4 - len(one_experiment()) for _ in range(N)) / float(N)
probability_no_women_table_1 = sum(1 not in one_experiment() for _ in range(N)) / float(N)
对于大 N,您得到的值应该大约为 p = (3 / 4)^3 和 E[X] = (3^3) / (4^2)。
您可以使用 functools.reduce
在 Python 3.x.
from functools import reduce
event_probability = reduce(lambda x, y: x*y, collection)
所以在你的代码中:
from functools import reduce
T = 4 # number of tables
N = 20 # number of persons. Assumption: N is a multiple of T.
K = 5 # capacity per table
W = 3 # number of women. Assumption: first W of N persons are women.
M = 100 #number of trials
collection = []
for i in range(K):
x = (((N-W)-i)/(N-i))
collection.append(x)
event_probability = reduce(lambda x, y: x*y, collection)
print(collection)
print(event_probability)
输出:
[0.85, 0.8421052631578947, 0.8333333333333334, 0.8235294117647058, 0.8125] # collection
0.3991228070175438 # event_probability
然后您可以使用结果来完成您的代码。
实施
这是蒙特卡洛模拟的简单实现。它不是为高性能而设计的,而是允许您交叉检查设置并查看详细信息:
import collections
import numpy as np
def runMonteCarlo(nw=3, nh=20, nt=4, N=20):
"""
Run Monte Carlo Simulation
"""
def countWomen(c, nt=4):
"""
Count Number of Women per Table
"""
x = np.array(c).reshape(nt, -1).T # Split permutation into tables
return np.sum(x, axis=0) # Sum woman per table
# Initialization:
comp = np.array([1]*nw + [0]*(nh-nw)) # Composition: 1=woman, 0=man
x = [] # Counts of tables without any woman
p = 0 # Probability of there is no woman at table A
for k in range(N):
c = np.random.permutation(comp) # Random permutation, table composition
w = countWomen(c, nt=nt) # Count Woman per table
nc = np.sum(w!=0) # Count how many tables with women
x.append(nt - nc) # Store count of tables without any woman
p += int(w[0]==0) # Is table A empty?
#if k % 100 == 0:
#print(c, w, nc, nt-nc, p)
# Rationalize (count->frequency)
r = collections.Counter(x)
r = {k:r.get(k, 0)/N for k in range(nt+1)}
p /= N
return r, p
执行作业:
for n in [100, 1000, 10000]:
s = runMonteCarlo(N=n)
E = sum([k*v for k,v in s[0].items()])
print('N=%d, P(X=k) = %s, p=%s, E[X]=%s' % (n, *s, E))
Returns:
N=100, P(X=k) = {0: 0.0, 1: 0.43, 2: 0.54, 3: 0.03, 4: 0.0}, p=0.38, E[X]=1.6
N=1000, P(X=k) = {0: 0.0, 1: 0.428, 2: 0.543, 3: 0.029, 4: 0.0}, p=0.376, E[X]=1.601
N=10000, P(X=k) = {0: 0.0, 1: 0.442, 2: 0.5235, 3: 0.0345, 4: 0.0}, p=0.4011, E[X]=1.5924999999999998
绘制分布图,得出:
import pandas as pd
axe = pd.DataFrame.from_dict(s[0], orient='index').plot(kind='bar')
axe.set_title("Monte Carlo Simulation")
axe.set_xlabel('Random Variable, $X$')
axe.set_ylabel('Frequency, $F(X=k)$')
axe.grid()
与替代版本的分歧
Caution: this method does not answer the stated problem!
如果我们实现另一个版本的模拟,我们改变随机实验的执行方式如下:
import random
import collections
def runMonteCarlo2(nw=3, nh=20, nt=4, N=20):
"""
Run Monte Carlo Simulation
"""
def one_experiment(nt, nw):
"""
Table setup (suggested by @Inon Peled)
"""
return set(random.randint(0, nt-1) for _ in range(nw)) # Sample nw times from 0 <= k <= nt-1
c = collections.Counter() # Empty Table counter
p = 0 # Probability of there is no woman at table A
for k in range(N):
exp = one_experiment(nt, nw) # Select table with at least one woman
c.update([nt - len(exp)]) # Update Counter X distribution
p += int(0 not in exp) # There is no woman at table A (table 0)
# Rationalize:
r = {k:c.get(k, 0)/N for k in range(nt+1)}
p /= N
return r, p
它returns:
N=100, P(X=k) = {0: 0.0, 1: 0.41, 2: 0.51, 3: 0.08, 4: 0.0}, p=0.4, E[X]=1.67
N=1000, P(X=k) = {0: 0.0, 1: 0.366, 2: 0.577, 3: 0.057, 4: 0.0}, p=0.426, E[X]=1.691
N=1000000, P(X=k) = {0: 0.0, 1: 0.37462, 2: 0.562787, 3: 0.062593, 4: 0.0}, p=0.42231, E[X]=1.687973
第二个版本收敛到另一个值,显然不等同于第一个版本,它没有回答同一个问题。
讨论
为了区分哪种实现是正确的,我对这两种实现都有 computed sampled spaces and probabilities。似乎第一个版本是正确的,因为它考虑到女性坐在 table 的概率取决于之前被选中的人。第二个版本没有考虑它,这就是为什么它不需要知道有多少人以及每个 table.
可以容纳多少人。这是一个很好的问题,因为两个答案都提供了接近的结果。这项工作的一个重要部分是妥善设置 Monte Carlo 输入。