这个可以修改成运行更快吗？

Question

我正在使用 python 创建一个单词列表，它命中字符的每个组合，这是超过 94⁴ 的计算的怪物。在你问我从哪里得到 94 之前，94 涵盖了 ASCII 字符 32 到 127。可以理解这个函数运行起来超级慢，我很好奇是否有办法让它更有效率。

这是我的代码的主要内容。

def CreateTable(name,ASCIIList,size):
    f = open(name + '.txt','w')
    combo = itertools.product(ASCIIList, repeat = size)
    for x in combo:
        passwords = ''.join(x)
        f.write(str(passwords) + '\n')
    f.close()

我使用它是为了在我不知道密码长度或密码包含哪些字符的情况下制作列表以用于暴力破解。使用这样的列表，我找到了每一个可能的单词组合，所以我最终肯定会找到正确的单词。之前已经说过这是一个缓慢的程序，读起来也很慢，并且不是我的首选蛮力，这或多或少是最后的努力。

让您了解这段代码运行了多长时间。我创建了大小 5 和运行的所有组合 3 小时，最后超过 50GB。

Answer 1

警告：我还没有测试过这段代码。

我会将 combo 转换为 list：combo_list = list(combo) 然后我会把它分成块：

# 
def get_chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

# Change 1000 to whatever works.
chunks = get_chunks(combo_list, 1000)

接下来，我将使用多线程来处理每个块：

class myThread (threading.Thread):
   def __init__(self, chunk_id, chunk):
      threading.Thread.__init__(self)
      self.chunk_id = chunk_id
      self.chunk = chunk

   def run(self):
      print ("Starting " + self.chunk_id)
      process_data(self.chunk)
      print ("Exiting " + self.chunk_id)

def process_data():
  f = open(self.chunk_id + '.txt','w')
  for item in self.chunk:
      passwords = ''.join(item)
      f.write(str(passwords) + '\n')
  f.close()

然后我会做这样的事情：

threads = []
for i, chunk in enumerate(chunks):
    thread = myThread(i, chunk)
    thread.start()
    threads.append(thread)

# Wait for all threads to complete
for t in threads:
   t.join()

如果需要，您可以编写另一个脚本来合并所有输出文件。

Answer 2

我对此做了一些测试，我认为主要问题是您在文本模式下编写。

二进制模式更快，而且你只处理 ASCII，所以你最好输出字节而不是字符串。

这是我的代码：

import itertools
import time

def CreateTable(name,ASCIIList,size):
    f = open(name + '.txt','w')
    combo = itertools.product(ASCIIList, repeat = size)
    for x in combo:
        passwords = ''.join(x)
        f.write(str(passwords) + '\n')
    f.close()

def CreateTableBinary(name,ASCIIList,size):
    f = open(name + '.txt', 'wb')
    combo = itertools.product(ASCIIList, repeat = size)
    for x in combo:
        passwords = bytes(x)
        f.write(passwords)
        f.write(b'\n')
    f.close()

def CreateTableBinaryFast(name,first,last,size):
    f = open(name + '.txt', 'wb')
    x = bytearray(chr(first) * size, 'ASCII')
    while True:
        f.write(x)
        f.write(b'\n')

        i = size - 1
        while (x[i] == last) and (i > 0):
            x[i] = first
            i -= 1
        if i == 0 and x[i] == last:
            break
        x[i] += 1
    f.close()

def CreateTableTheoreticalMax(name,ASCIIList,size):
    f = open(name + '.txt', 'wb')
    combo = range(0, len(ASCIIList)**size)
    passwords = b'A' * size
    for x in combo:
        f.write(passwords)
        f.write(b'\n')
    f.close()

print("writing real file in text mode")
start = time.time()
chars = [chr(x) for x in range(32, 126)]
CreateTable("c:/temp/output", chars, 4)
print("that took ", time.time() - start, "seconds.")

print("writing real file in binary mode")
start = time.time()
chars = bytes(range(32, 126))
CreateTableBinary("c:/temp/output", chars, 4)
print("that took ", time.time() - start, "seconds.")

print("writing real file in fast binary mode")
start = time.time()
CreateTableBinaryFast("c:/temp/output", 32, 125, size)
print("that took ", time.time() - start, "seconds.")

print("writing fake file at max speed")
start = time.time()
chars = [chr(x) for x in range(32, 126)]
CreateTableTheoreticalMax("c:/temp/output", chars, 4)
print("that took ", time.time() - start, "seconds.")

输出：

writing real file in text mode
that took  101.5869083404541 seconds.
writing real file in binary mode
that took  40.960529804229736 seconds.
writing real file in fast binary mode
that took  35.54869604110718 seconds.
writing fake file at max speed
that took  26.43029284477234 seconds.

所以你可以通过切换到二进制模式看到相当大的改进。

此外，似乎还有一些松懈需要弥补，因为省略 itertools.product 并写入硬编码字节会更快。也许您可以编写自己的 product 版本，直接输出类似字节的对象。不确定。

编辑： 我看了一本直接在字节数组上工作的手册 itertools.product。它有点快 - 请参阅代码中的 "fast binary mode"。

这个可以修改成运行更快吗？

Can this be modified to run faster?

performance

python-3.6