为什么我的重合指数之和的平均值在Python3 是错误的？

Question

我编写了一个程序来尝试查找法语 vigenere ciphered text (should be a random string) by calculating the mean of all the index of coincidence of a text which is cut every x element, basically I'm trying to get the mean value of the IC closest to 0.06/0.07 according to this reference website 中键的长度 x，但我的平均值看起来不对。

注意：我只处理大写字符串，完全没有标点符号，没有特殊字符，也没有空格。

我有 3 个函数，其中一个函数给出文本中出现字母的列表。

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def freq(txt):
    """
    give back all occurences of letters in a list
    """
    hist = [0.0] * len(alphabet)
    for letter in txt:
        hist[alphabet.index(letter)] += 1

    return hist

def IC(occurences):
    """
    this function takes a list of occurences of letters from a text and calculates the IC
    and returns the IC
    """
    sum_ = 0
    n = sum(occurences)
    if n == 1:
        return 0
    for occ_letter in occurences:
        sum_ += occ_letter * (occ_letter - 1)
    sum_ /= n * (n - 1)

    return sum_

# These two functions work well (or so I believe), but when I wrote this one :

# This function could be wrong
def key_length(cipher):
    """
    cipher is the ciphered text.
    Find the length of the key by cutting the text and trying to find a mean value over 0.06
    Returns the length of the key
    """
    max_key_len = 20 # (the length of the key will never be over 20)
 
    max_ = 0
    index_max = 0
    for x in range(1, max_key_len + 1): # Going from 1 to 20
        pieces = [cipher[i : i + x] for i in range(0, len(cipher), x)]
        # calculating the mean value of all the piece in text cut in x pieces
        mean_IC = sum(IC(freq(piece)) for piece in pieces) / len(pieces)
        print(mean_IC, x)
        if mean_IC > max_:
            max_ = mean_IC
            index_max = x
            if mean_IC > 0.06:
                break
    return index_max

追踪 IC 的平均值给了我这个 sample text :

0.043117408906882586 20
0.03727725599070627 19
0.03944106378183457 18
0.04047088532382649 17
0.04331597222222223 16
0.04154995331465919 15
0.037422037422037424 14
0.04287251210328133 13
0.037350246652572215 12
0.03882301273605619 11
0.04291938997821349 10
0.04191033138401558 9
0.04185267857142856 8
0.03522504892367906 7
0.03686274509803924 6
0.04554455445544554 5
0.04199475065616799 4
0.043392504930966455 3
0.03557312252964427 2
0.0 1

None 这些值超过 0.06（但有些应该超过 7，因为 7 是真正的密钥长度）。我在网上查看了一个 cryptography website 正是这样做的（点击按钮 CALCULATE PROBABLE KEY-LENGTHS），我得到了一些与相同文本完全不同的东西，其中 L 是密钥长度，总之有一些东西错了，但我好像找不到它:

L=7 IC ≈ 0.07832 ± 0.006 --> This is the key
L=14 IC ≈ 0.07986 ± 0.013
L=21 IC ≈ 0.07504 ± 0.015
L=5 IC ≈ 0.04717 ± 0.028
L=10 IC ≈ 0.04667 ± 0.028
L=15 IC ≈ 0.04646 ± 0.029
L=19 IC ≈ 0.04616 ± 0.029
L=20 IC ≈ 0.04562 ± 0.029
L=22 IC ≈ 0.04523 ± 0.03
L=11 IC ≈ 0.04516 ± 0.03
L=4 IC ≈ 0.04515 ± 0.03
L=3 IC ≈ 0.04494 ± 0.03
L=1 IC ≈ 0.04487 ± 0.03
L=17 IC ≈ 0.04462 ± 0.03
L=2 IC ≈ 0.04453 ± 0.03
L=9 IC ≈ 0.04373 ± 0.031
L=23 IC ≈ 0.04352 ± 0.031
L=13 IC ≈ 0.04336 ± 0.032
L=8 IC ≈ 0.04319 ± 0.032
L=6 IC ≈ 0.04278 ± 0.032
L=26 IC ≈ 0.0426 ± 0.032
L=12 IC ≈ 0.04199 ± 0.033
L=16 IC ≈ 0.04128 ± 0.034
L=25 IC ≈ 0.04131 ± 0.034
L=18 IC ≈ 0.04006 ± 0.035
L=24 IC ≈ 0.03681 ± 0.038

Answer 1

在密钥长度猜测中，您不使用 pieces[i:i+x] 计算片段，这将无法提供执行 IC 评估所需的正确陪集。

相反，您可能想要定义一个列函数的剪切：

def cut_in_columns(text, k):
    """
    Cut text in k columns
    """

    columns = [[] for _ in range(k)]

    for i, l in enumerate(text):
        columns[i % k].append(l)

    return columns

这将为您提供 pieces 的权利。由于这是真的原因，我可能会将您重定向到此 post：https://pages.mtu.edu/~shene/NSF-4/Tutorial/VIG/Vig-IOC-Len.html 关于密钥长度猜测。

在您的代码中替换它会产生所需的结果。

根本属性在于这些密码方法的定义。

顺便说一句，你应该测试你的代码 post，你有一个未定义的变量（法语 moyenne 与 mean 混合）并且在你的最大计算中早期 return这甚至不是最大计算……您应该确保您的最小工作示例确实有效。

为什么我的重合指数之和的平均值在Python3 是错误的？

Why is my mean value of the sum of the index of coincidence wrong in Python3?

python

cryptography

python-3.x

python-cryptography