将列表中的子元素与另一个进行比较
comparing sub elements in a list with another
我有一个句子列表 listOfSentences
,看起来像这样:
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
我还有一本 keywords
的字典,看起来像这样:
keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}
频率越高的词在 keyWords
中的键越小。
>>> print(keySentences)
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',]
我的问题是:如何比较 keyWords
中的元素与 listOfSentences
中的元素,以便输出列表 keySentences
这样试试:
>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]]
['bam bam bam she also loves ham.', 'she ate the lamb.']
keyWords
如果是字典就比较有用,那么就是简单的查字典,得到每个单词的分值。可以使用 split()
.
提取每个单词
这里有一些代码可以做到这一点。这假设标点符号是单词的一部分(正如您的示例结果列表 keySentences
所暗示的):
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.']
如果您想忽略标点符号,可以在处理之前将其从每个句子中删除:
import string
# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']
现在生成的列表还包含 "mary had a little lamb.",因为句号尾随 "lamb" 已被 str.translate()
删除。
下面会根据匹配的字数给你的句子打分:
import re
keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = [w for w, c in keyWords] # only need the words
listOfSentences = [
'mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
words = [re.findall(r'(\w+)', s) for s in listOfSentences]
keySentences = []
for word_list, sentence in zip(words, listOfSentences):
keySentences.append((len([word for word in word_list if word in keyWords]), sentence))
for count, sentence in sorted(keySentences, reverse=True):
print '{:2} {}'.format(count, sentence)
为您提供以下输出:
3 bam bam bam she also loves ham.
2 she ate the lamb.
1 mary had a little lamb.
0 she also had a little pram
我有一个句子列表 listOfSentences
,看起来像这样:
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
我还有一本 keywords
的字典,看起来像这样:
keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}
频率越高的词在 keyWords
中的键越小。
>>> print(keySentences)
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',]
我的问题是:如何比较 keyWords
中的元素与 listOfSentences
中的元素,以便输出列表 keySentences
这样试试:
>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]]
['bam bam bam she also loves ham.', 'she ate the lamb.']
keyWords
如果是字典就比较有用,那么就是简单的查字典,得到每个单词的分值。可以使用 split()
.
这里有一些代码可以做到这一点。这假设标点符号是单词的一部分(正如您的示例结果列表 keySentences
所暗示的):
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.']
如果您想忽略标点符号,可以在处理之前将其从每个句子中删除:
import string
# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
输出
['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']
现在生成的列表还包含 "mary had a little lamb.",因为句号尾随 "lamb" 已被 str.translate()
删除。
下面会根据匹配的字数给你的句子打分:
import re
keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = [w for w, c in keyWords] # only need the words
listOfSentences = [
'mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
words = [re.findall(r'(\w+)', s) for s in listOfSentences]
keySentences = []
for word_list, sentence in zip(words, listOfSentences):
keySentences.append((len([word for word in word_list if word in keyWords]), sentence))
for count, sentence in sorted(keySentences, reverse=True):
print '{:2} {}'.format(count, sentence)
为您提供以下输出:
3 bam bam bam she also loves ham.
2 she ate the lamb.
1 mary had a little lamb.
0 she also had a little pram