Python - 计算重复字符串
Python - counting duplicate strings
我正在尝试编写一个函数,该函数将计算字符串中重复单词的数量,然后 return 如果重复数量超过特定数量 (n),则计算该单词。这是我目前所拥有的:
from collections import defaultdict
def repeat_word_count(text, n):
words = text.split()
tally = defaultdict(int)
answer = []
for i in words:
if i in tally:
tally[i] += 1
else:
tally[i] = 1
在将字典值与 n 进行比较时,我不知道从这里该何去何从。
它应该如何工作:
repeat_word_count("one one was a racehorse two two was one too", 3) 应该 return ['one']
尝试
for i in words:
tally[i] = tally.get(i, 0) + 1
而不是
for i in words:
if i in tally:
tally[words] += 1 #you are using words the list as key, you should use i the item
else:
tally[words] = 1
如果你只是想统计字数,用collections.Counter就可以了。
>>> import collections
>>> a = collections.Counter("one one was a racehorse two two was one too".split())
>>> a
Counter({'one': 3, 'two': 2, 'was': 2, 'a': 1, 'racehorse': 1, 'too': 1})
>>> a['one']
3
如果你想要的是 dictionary
计算字符串中的单词,你可以试试这个:
string = 'hello world hello again now hi there hi world'.split()
d = {}
for word in string:
d[word] = d.get(word, 0) +1
print d
输出:
{'again': 1, 'there': 1, 'hi': 2, 'world': 2, 'now': 1, 'hello': 2}
这里有一个方法:
from collections import defaultdict
tally = defaultdict(int)
text = "one two two three three three"
for i in text.split():
tally[i] += 1
print tally # defaultdict(<type 'int'>, {'three': 3, 'two': 2, 'one': 1})
将其放入函数中:
def repeat_word_count(text, n):
output = []
tally = defaultdict(int)
for i in text.split():
tally[i] += 1
for k in tally:
if tally[k] > n:
output.append(k)
return output
text = "one two two three three three four four four four"
repeat_word_count(text, 2)
Out[141]: ['four', 'three']
如luoluo所说,使用collections.Counter。
要获得计数最高的项目,请使用带有参数 1
的 Counter.most_common
方法,其中 returns 对 (word, tally)
的列表,其第二个坐标都是相同的最大计数。如果 "sentence" 是非空的,那么该列表也是。所以,下面的函数 returns some 如果有的话至少出现 n
次,否则 returns None
:
from collections import Counter
def repeat_word_count(text, n):
if not text: return None # guard against '' and None!
counter = Counter(text.split())
max_pair = counter.most_common(1)[0]
return max_pair[0] if max_pair[1] > n else None
你为什么不使用 Counter class 来处理这种情况:
from collections import Counter
cnt = Counter(text.split())
其中元素存储为字典键,它们的计数存储为字典值。然后很容易在for循环中用iterkeys()保留超过你的n个数的单词,比如
list=[]
for k in cnt.iterkeys():
if cnt[k]>n:
list.append(k)
在列表中你会得到你的单词列表。
**已编辑:抱歉,如果您需要很多字,BrianO 有适合您的情况。
我正在尝试编写一个函数,该函数将计算字符串中重复单词的数量,然后 return 如果重复数量超过特定数量 (n),则计算该单词。这是我目前所拥有的:
from collections import defaultdict
def repeat_word_count(text, n):
words = text.split()
tally = defaultdict(int)
answer = []
for i in words:
if i in tally:
tally[i] += 1
else:
tally[i] = 1
在将字典值与 n 进行比较时,我不知道从这里该何去何从。
它应该如何工作: repeat_word_count("one one was a racehorse two two was one too", 3) 应该 return ['one']
尝试
for i in words:
tally[i] = tally.get(i, 0) + 1
而不是
for i in words:
if i in tally:
tally[words] += 1 #you are using words the list as key, you should use i the item
else:
tally[words] = 1
如果你只是想统计字数,用collections.Counter就可以了。
>>> import collections
>>> a = collections.Counter("one one was a racehorse two two was one too".split())
>>> a
Counter({'one': 3, 'two': 2, 'was': 2, 'a': 1, 'racehorse': 1, 'too': 1})
>>> a['one']
3
如果你想要的是 dictionary
计算字符串中的单词,你可以试试这个:
string = 'hello world hello again now hi there hi world'.split()
d = {}
for word in string:
d[word] = d.get(word, 0) +1
print d
输出:
{'again': 1, 'there': 1, 'hi': 2, 'world': 2, 'now': 1, 'hello': 2}
这里有一个方法:
from collections import defaultdict
tally = defaultdict(int)
text = "one two two three three three"
for i in text.split():
tally[i] += 1
print tally # defaultdict(<type 'int'>, {'three': 3, 'two': 2, 'one': 1})
将其放入函数中:
def repeat_word_count(text, n):
output = []
tally = defaultdict(int)
for i in text.split():
tally[i] += 1
for k in tally:
if tally[k] > n:
output.append(k)
return output
text = "one two two three three three four four four four"
repeat_word_count(text, 2)
Out[141]: ['four', 'three']
如luoluo所说,使用collections.Counter。
要获得计数最高的项目,请使用带有参数 1
的 Counter.most_common
方法,其中 returns 对 (word, tally)
的列表,其第二个坐标都是相同的最大计数。如果 "sentence" 是非空的,那么该列表也是。所以,下面的函数 returns some 如果有的话至少出现 n
次,否则 returns None
:
from collections import Counter
def repeat_word_count(text, n):
if not text: return None # guard against '' and None!
counter = Counter(text.split())
max_pair = counter.most_common(1)[0]
return max_pair[0] if max_pair[1] > n else None
你为什么不使用 Counter class 来处理这种情况:
from collections import Counter
cnt = Counter(text.split())
其中元素存储为字典键,它们的计数存储为字典值。然后很容易在for循环中用iterkeys()保留超过你的n个数的单词,比如
list=[]
for k in cnt.iterkeys():
if cnt[k]>n:
list.append(k)
在列表中你会得到你的单词列表。
**已编辑:抱歉,如果您需要很多字,BrianO 有适合您的情况。