defaultdict 中的整数

Question

我想浏览一个文本文件并创建一个字典，其中包含关键字和它们出现的次数 up.I 希望它看起来像这样：

defaultdict(<type 'int'>, {'keyword1': 1, 'keyword2': 0, 'keyword3': 3, 'keyword4': 9})

现在我得到了这样的东西：

defaultdict(<type 'int'>, {'keyword1': 1})

我可以打印字典中的每个关键字，因为它循环通过，所以我知道它正在尝试一些事情。我也知道应该弹出更多这些关键字，并且它们应该在文本文件中有实例。我的代码：

find_it=['keyword1', 'keyword2', 'keyword3', 'keyword4']

with open('inputfile.txt', 'r') as f:
    out = defaultdict(int)

    for key in find_it:
        counter=0
        for line in f:
            if key in line:
                out[key] += 1

my_keys=dict(**out)

我在这里错过了什么？

Answer 1

from collections import Counter
my_current_count = Counter(open('inputfile.txt').read().split())

应该这样做......而且更简单

for shared_key in set(my_current_count).intersection(my_list_of_keywords):
    print my_current_count[shared_key]

在目前的状态下，要使您的原始方法发挥作用并且仍然可以识别，需要做的事情太多了

Answer 2

您已经在 for key in find_it: 的第一次迭代中读取文件中的所有内容，因此对于下一个密钥，没有任何内容可读。

我建议你交换那些 for 循环。

with open('inputfile.txt', 'r') as f:
    out = defaultdict(int)

    for line in f:
        for key in find_it:
            if key in line.strip().split(' '):
                out[key] += 1

顺便说一下，我强烈建议您使用一行解决方案，因为对于将来查看您的代码的任何人来说，它都更容易阅读和理解。

Answer 3

Joran 说得对，Counter 比 defaultdict 更适合您的工作。这是一个替代解决方案：

inputfile.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

count.py

from collections import Counter

find_it = {"be", "do", "of", "the", "to"}

keys = Counter()

with open("inputfile.txt") as f:
    for line in f:
        matches = Counter(w for w in line.split() if w in find_it)
        keys += matches

print(keys)

$ python count.py
Counter({'the': 5, 'to': 5, 'be': 3, 'of': 3, 'do': 2})

这会找到每行中与 find_it 匹配的次数，并在进行时将它们添加到运行计数器 keys。

编辑： 正如 Blckknght 在评论中指出的那样，先前的解决方案错过了关键字在一行中出现多次的情况。代码的编辑版本使用与以前略有不同的方法来克服该问题。

defaultdict 中的整数

Integers in defaultdict

python

dictionary

defaultdict