如何在 python 中优化 word_count

Question

给我n个词(1≤n≤10^5)。有些话可能会重复。对于每个单词，我必须输出它的出现次数。但是输出顺序要对应单词第一次出现的顺序。

我有一个解决问题的工作程序，但对于大量输入，我会超时。这是我对问题的解决方案：

n=int(input())
l=[]
ll=[]

for x in range(n):
    l.append(raw_input())
    if l[x] not in ll:
        ll.append(l[x])

result = [ l.count(ll[x]) for x in range(len(ll)) ]

for x in range(len(result)):
    print result[x],

Answer 1

在 python 中计算项目的最简单方法是使用 Counter from the collections 模块。

假设您有一个按预期顺序排列的项目列表，将其传递给 Counter 就足够了：

c = collections.Counter(['foo', 'bar', 'bar'])
print(c['bar'])  # Will print 2

如果 words 是您从用户那里检索到的单词列表，您可以遍历它以打印值：

seen = set()
for elem in words:
    if elem not in seen:
        print(counter[elem])
        seen.add(elem)

Answer 2

看看collections.OrderedDict。它可以为您处理这个问题，并且它消除了使用 list 的线性成员资格测试费用：

import collections

n = int(input())
l = []
ll = collections.OrderedDict()

for x in range(n):
    v = raw_input()
    l.append(v)
    ll[v] = None  # If v already in OrderedDict, does nothing, otherwise, appends

ll = list(ll)  # Can convert back to list when you're done if you like

如果您需要计数，您可以基于 OrderedDict 进行自定义 class，它既可以处理计数又可以保持有序。

class OrderedCounter(collections.OrderedDict):
    def __missing__(self, key):
        return 0

然后将 ll 更改为 OrderedCounter，并将 ll[v] = None 更改为 ll[v] += 1。最后，ll 将得到有序单词及其计数； l 甚至不需要：

for word, count in ll.items():
    print(word, count)

最终代码将简化为（省略导入和 class 定义）：

n = int(input())
word_counts = OrderedCounter()

for x in range(n):
    word_counts[raw_input()] += 1

for cnt in word_counts.values():
    print cnt,

简单多了，对吧？

Answer 3

通过子类化 OrderedDict 和 Counter 使用有序计数器：

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

counts = OrderedCounter(['b', 'c', 'b', 'b', 'a', 'c'])
for k, c in counts.items():
    print(k, c)

打印：

b 3
c 2
a 1

请参阅 collections 模块的 documentation 以获得更完整的方法，其中包括 OrderedCounter 的 __repr__。

如何在 python 中优化 word_count

How to optimize word_count in python

python

optimization

word-count