两个列表，在 python 中比较更快

Question

我正在编写 python (2.7) 脚本来比较两个列表。这些列表是通过读取文件内容从文件创建的。文件只是文本文件，没有二进制文件。文件 1 仅包含哈希值（一些明文单词的 MD5 和），文件 2 是 hash:plain。列表具有不同的长度（逻辑上，我可以有比散列更少的 'cracked' 条目）并且两者都不能排序，因为我必须保持顺序，但这是我试图实现的下一步。到目前为止，我的简单代码如下所示：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import os

def ifexists(fname):
    if not os.path.isfile(fname):
        print('[-] %s must exist' % fname)
        sys.exit(1)

if len(sys.argv) < 2:
    print('[-] please provide CRACKED and HASHES files')
    sys.exit(1)

CRACKED=sys.argv[1]
HASHES=sys.argv[2]

sk_ifexists(CRACKED)
sk_ifexists(HASHES)

with open(CRACKED) as cracked, open(HASHES) as hashes:
    hashdata=hashes.readlines()
    crackdata=cracked.readlines()
    for c in crackdata:
        for z in hashdata:
            if c.strip().split(':', 1)[0] in z:
                print('found: ', c.strip().split(':', 1))

基本上，我必须用在 CRACKED 列表中找到的匹配行 hash:plain 替换在 HASHES 列表中找到的哈希。我正在遍历 CRACKED，因为它每次都会更短。所以我的问题是上面的代码对于较长的列表来说非常慢。例如，处理两个 60k 行的文本文件最多需要 15 分钟。您对加快速度有何建议？

Answer 1

将其中一个文件存储在字典或集合中；取出一个完整的循环，查找平均为 O(1) 常数时间。

例如，看起来 crackdata 文件可以很容易地转换为字典：

with open(CRACKED) as crackedfile:
    cracked = dict(map(str.strip, line.split(':')) for line in crackedfile if ':' in line)

现在您只需循环另一个文件一次:

with open(HASHES) as hashes:
    for line in hashes:
        hash = line.strip()
        if hash in cracked:
            print('Found:', hash, 'which maps to', cracked[hash])

两个列表，在 python 中比较更快

two lists, faster comparison in python

python

comparison

performance

python-2.7