碰撞需要多少样本(生日悖论)
How many samples needed for a collision (birthday paradox)
只是想了解生日悖论。
使用以下代码,我发现我平均需要 12 个样本 才能获得生日碰撞。
无法理解为什么比正常23人得到1/2的生日碰撞几率低很多。
即使我使用 PyCrypto 中的 StrongRandom,结果也不会改变。
from random import randint
from Crypto.Random.random import StrongRandom
EXPERIMENTS_NUM = 10000
SET_SIZE = 365
SUBSET = 23
where_collision_found = list()
rnd = StrongRandom()
for experiment in range(EXPERIMENTS_NUM):
for i in range(1,SET_SIZE + 2):
collision_found = False
#Generate a subset
# subset = [rnd.randint(1, SET_SIZE) for x in range(i)]
subset = [randint(1, SET_SIZE) for x in range(i)]
# Check for collision
flags = [False for x in range(SET_SIZE + 1)]
for k in range(i):
if flags[subset[k]]: #Collision found
collision_found = True
else:
flags[subset[k]] = True
if collision_found:
# print 'Collision found in set:', subset
break
where_collision_found.append(i)
print 'average collision:', sum(where_collision_found) / float(len(where_collision_found)), 'in', EXPERIMENTS_NUM, 'experiments'
结果:
average collision: 12.1277 in 10000 experiments
我不太清楚你的代码在做什么。这是我刚才所做的:
from random import randrange
N = 365
ns = []
for _ in range(10000):
n = 0
seen = set()
while True:
b = randrange(N)
n += 1
if b in seen:
break
seen.add(b)
ns.append(n)
print(sum(ns) / float(len(ns)))
样本输出运行:
24.6577
这很好。您期望的“23”是分布的中位数;平均值(平均值)预计为 24.61659 ...请参阅此处:
https://en.wikipedia.org/wiki/Birthday_problem#Average_number_of_people
只是想了解生日悖论。
使用以下代码,我发现我平均需要 12 个样本 才能获得生日碰撞。
无法理解为什么比正常23人得到1/2的生日碰撞几率低很多。
即使我使用 PyCrypto 中的 StrongRandom,结果也不会改变。
from random import randint
from Crypto.Random.random import StrongRandom
EXPERIMENTS_NUM = 10000
SET_SIZE = 365
SUBSET = 23
where_collision_found = list()
rnd = StrongRandom()
for experiment in range(EXPERIMENTS_NUM):
for i in range(1,SET_SIZE + 2):
collision_found = False
#Generate a subset
# subset = [rnd.randint(1, SET_SIZE) for x in range(i)]
subset = [randint(1, SET_SIZE) for x in range(i)]
# Check for collision
flags = [False for x in range(SET_SIZE + 1)]
for k in range(i):
if flags[subset[k]]: #Collision found
collision_found = True
else:
flags[subset[k]] = True
if collision_found:
# print 'Collision found in set:', subset
break
where_collision_found.append(i)
print 'average collision:', sum(where_collision_found) / float(len(where_collision_found)), 'in', EXPERIMENTS_NUM, 'experiments'
结果:
average collision: 12.1277 in 10000 experiments
我不太清楚你的代码在做什么。这是我刚才所做的:
from random import randrange
N = 365
ns = []
for _ in range(10000):
n = 0
seen = set()
while True:
b = randrange(N)
n += 1
if b in seen:
break
seen.add(b)
ns.append(n)
print(sum(ns) / float(len(ns)))
样本输出运行:
24.6577
这很好。您期望的“23”是分布的中位数;平均值(平均值)预计为 24.61659 ...请参阅此处: https://en.wikipedia.org/wiki/Birthday_problem#Average_number_of_people