为什么我的重合指数之和的平均值在Python3 是错误的?
Why is my mean value of the sum of the index of coincidence wrong in Python3?
我编写了一个程序来尝试查找法语 vigenere ciphered text (should be a random string) by calculating the mean of all the index of coincidence of a text which is cut every x element, basically I'm trying to get the mean value of the IC closest to 0.06/0.07 according to this reference website 中键的长度 x
,但我的平均值看起来不对。
注意:我只处理大写字符串,完全没有标点符号,没有特殊字符,也没有空格。
我有 3 个函数,其中一个函数给出文本中出现字母的列表。
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def freq(txt):
"""
give back all occurences of letters in a list
"""
hist = [0.0] * len(alphabet)
for letter in txt:
hist[alphabet.index(letter)] += 1
return hist
def IC(occurences):
"""
this function takes a list of occurences of letters from a text and calculates the IC
and returns the IC
"""
sum_ = 0
n = sum(occurences)
if n == 1:
return 0
for occ_letter in occurences:
sum_ += occ_letter * (occ_letter - 1)
sum_ /= n * (n - 1)
return sum_
# These two functions work well (or so I believe), but when I wrote this one :
# This function could be wrong
def key_length(cipher):
"""
cipher is the ciphered text.
Find the length of the key by cutting the text and trying to find a mean value over 0.06
Returns the length of the key
"""
max_key_len = 20 # (the length of the key will never be over 20)
max_ = 0
index_max = 0
for x in range(1, max_key_len + 1): # Going from 1 to 20
pieces = [cipher[i : i + x] for i in range(0, len(cipher), x)]
# calculating the mean value of all the piece in text cut in x pieces
mean_IC = sum(IC(freq(piece)) for piece in pieces) / len(pieces)
print(mean_IC, x)
if mean_IC > max_:
max_ = mean_IC
index_max = x
if mean_IC > 0.06:
break
return index_max
追踪 IC 的平均值给了我这个 sample text :
0.043117408906882586 20
0.03727725599070627 19
0.03944106378183457 18
0.04047088532382649 17
0.04331597222222223 16
0.04154995331465919 15
0.037422037422037424 14
0.04287251210328133 13
0.037350246652572215 12
0.03882301273605619 11
0.04291938997821349 10
0.04191033138401558 9
0.04185267857142856 8
0.03522504892367906 7
0.03686274509803924 6
0.04554455445544554 5
0.04199475065616799 4
0.043392504930966455 3
0.03557312252964427 2
0.0 1
None 这些值超过 0.06(但有些应该超过 7,因为 7 是真正的密钥长度)。我在网上查看了一个 cryptography website 正是这样做的(点击按钮 CALCULATE PROBABLE KEY-LENGTHS
),我得到了一些与相同文本完全不同的东西,其中 L 是密钥长度,总之有一些东西错了,但我好像找不到它:
L=7 IC ≈ 0.07832 ± 0.006 --> This is the key
L=14 IC ≈ 0.07986 ± 0.013
L=21 IC ≈ 0.07504 ± 0.015
L=5 IC ≈ 0.04717 ± 0.028
L=10 IC ≈ 0.04667 ± 0.028
L=15 IC ≈ 0.04646 ± 0.029
L=19 IC ≈ 0.04616 ± 0.029
L=20 IC ≈ 0.04562 ± 0.029
L=22 IC ≈ 0.04523 ± 0.03
L=11 IC ≈ 0.04516 ± 0.03
L=4 IC ≈ 0.04515 ± 0.03
L=3 IC ≈ 0.04494 ± 0.03
L=1 IC ≈ 0.04487 ± 0.03
L=17 IC ≈ 0.04462 ± 0.03
L=2 IC ≈ 0.04453 ± 0.03
L=9 IC ≈ 0.04373 ± 0.031
L=23 IC ≈ 0.04352 ± 0.031
L=13 IC ≈ 0.04336 ± 0.032
L=8 IC ≈ 0.04319 ± 0.032
L=6 IC ≈ 0.04278 ± 0.032
L=26 IC ≈ 0.0426 ± 0.032
L=12 IC ≈ 0.04199 ± 0.033
L=16 IC ≈ 0.04128 ± 0.034
L=25 IC ≈ 0.04131 ± 0.034
L=18 IC ≈ 0.04006 ± 0.035
L=24 IC ≈ 0.03681 ± 0.038
在密钥长度猜测中,您不使用 pieces[i:i+x]
计算片段,这将无法提供执行 IC 评估所需的正确陪集。
相反,您可能想要定义一个列函数的剪切:
def cut_in_columns(text, k):
"""
Cut text in k columns
"""
columns = [[] for _ in range(k)]
for i, l in enumerate(text):
columns[i % k].append(l)
return columns
这将为您提供 pieces
的权利。
由于这是真的原因,我可能会将您重定向到此 post:https://pages.mtu.edu/~shene/NSF-4/Tutorial/VIG/Vig-IOC-Len.html 关于密钥长度猜测。
在您的代码中替换它会产生所需的结果。
根本属性在于这些密码方法的定义。
顺便说一句,你应该测试你的代码 post,你有一个未定义的变量(法语 moyenne
与 mean
混合)并且在你的最大计算中早期 return这甚至不是最大计算……您应该确保您的最小工作示例 确实 有效。
我编写了一个程序来尝试查找法语 vigenere ciphered text (should be a random string) by calculating the mean of all the index of coincidence of a text which is cut every x element, basically I'm trying to get the mean value of the IC closest to 0.06/0.07 according to this reference website 中键的长度 x
,但我的平均值看起来不对。
注意:我只处理大写字符串,完全没有标点符号,没有特殊字符,也没有空格。
我有 3 个函数,其中一个函数给出文本中出现字母的列表。
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def freq(txt):
"""
give back all occurences of letters in a list
"""
hist = [0.0] * len(alphabet)
for letter in txt:
hist[alphabet.index(letter)] += 1
return hist
def IC(occurences):
"""
this function takes a list of occurences of letters from a text and calculates the IC
and returns the IC
"""
sum_ = 0
n = sum(occurences)
if n == 1:
return 0
for occ_letter in occurences:
sum_ += occ_letter * (occ_letter - 1)
sum_ /= n * (n - 1)
return sum_
# These two functions work well (or so I believe), but when I wrote this one :
# This function could be wrong
def key_length(cipher):
"""
cipher is the ciphered text.
Find the length of the key by cutting the text and trying to find a mean value over 0.06
Returns the length of the key
"""
max_key_len = 20 # (the length of the key will never be over 20)
max_ = 0
index_max = 0
for x in range(1, max_key_len + 1): # Going from 1 to 20
pieces = [cipher[i : i + x] for i in range(0, len(cipher), x)]
# calculating the mean value of all the piece in text cut in x pieces
mean_IC = sum(IC(freq(piece)) for piece in pieces) / len(pieces)
print(mean_IC, x)
if mean_IC > max_:
max_ = mean_IC
index_max = x
if mean_IC > 0.06:
break
return index_max
追踪 IC 的平均值给了我这个 sample text :
0.043117408906882586 20
0.03727725599070627 19
0.03944106378183457 18
0.04047088532382649 17
0.04331597222222223 16
0.04154995331465919 15
0.037422037422037424 14
0.04287251210328133 13
0.037350246652572215 12
0.03882301273605619 11
0.04291938997821349 10
0.04191033138401558 9
0.04185267857142856 8
0.03522504892367906 7
0.03686274509803924 6
0.04554455445544554 5
0.04199475065616799 4
0.043392504930966455 3
0.03557312252964427 2
0.0 1
None 这些值超过 0.06(但有些应该超过 7,因为 7 是真正的密钥长度)。我在网上查看了一个 cryptography website 正是这样做的(点击按钮 CALCULATE PROBABLE KEY-LENGTHS
),我得到了一些与相同文本完全不同的东西,其中 L 是密钥长度,总之有一些东西错了,但我好像找不到它:
L=7 IC ≈ 0.07832 ± 0.006 --> This is the key
L=14 IC ≈ 0.07986 ± 0.013
L=21 IC ≈ 0.07504 ± 0.015
L=5 IC ≈ 0.04717 ± 0.028
L=10 IC ≈ 0.04667 ± 0.028
L=15 IC ≈ 0.04646 ± 0.029
L=19 IC ≈ 0.04616 ± 0.029
L=20 IC ≈ 0.04562 ± 0.029
L=22 IC ≈ 0.04523 ± 0.03
L=11 IC ≈ 0.04516 ± 0.03
L=4 IC ≈ 0.04515 ± 0.03
L=3 IC ≈ 0.04494 ± 0.03
L=1 IC ≈ 0.04487 ± 0.03
L=17 IC ≈ 0.04462 ± 0.03
L=2 IC ≈ 0.04453 ± 0.03
L=9 IC ≈ 0.04373 ± 0.031
L=23 IC ≈ 0.04352 ± 0.031
L=13 IC ≈ 0.04336 ± 0.032
L=8 IC ≈ 0.04319 ± 0.032
L=6 IC ≈ 0.04278 ± 0.032
L=26 IC ≈ 0.0426 ± 0.032
L=12 IC ≈ 0.04199 ± 0.033
L=16 IC ≈ 0.04128 ± 0.034
L=25 IC ≈ 0.04131 ± 0.034
L=18 IC ≈ 0.04006 ± 0.035
L=24 IC ≈ 0.03681 ± 0.038
在密钥长度猜测中,您不使用 pieces[i:i+x]
计算片段,这将无法提供执行 IC 评估所需的正确陪集。
相反,您可能想要定义一个列函数的剪切:
def cut_in_columns(text, k):
"""
Cut text in k columns
"""
columns = [[] for _ in range(k)]
for i, l in enumerate(text):
columns[i % k].append(l)
return columns
这将为您提供 pieces
的权利。
由于这是真的原因,我可能会将您重定向到此 post:https://pages.mtu.edu/~shene/NSF-4/Tutorial/VIG/Vig-IOC-Len.html 关于密钥长度猜测。
在您的代码中替换它会产生所需的结果。
根本属性在于这些密码方法的定义。
顺便说一句,你应该测试你的代码 post,你有一个未定义的变量(法语 moyenne
与 mean
混合)并且在你的最大计算中早期 return这甚至不是最大计算……您应该确保您的最小工作示例 确实 有效。