Python - 遍历关键字列表并遍历句子以查找关键字与单词 "access" 之间的匹配项数
Python - loop through list of keywords and loop through sentences to find # of matches between keywords and the word "access"
我有一个关键字列表,我需要知道它们是否在列表中特定句子的单词 "access' in a sentence from a list. At the end, I want to total the number of times a keyword was matched with the word "access" 的 4 个单词内。
当前输出:
['Minority', 'patients', 'often', 'have', 'barrier', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'health', 'services.']
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 1
期望的输出:
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 2
["I, am, an, avid, user, of, Microsoft, Access, databases"] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'healthcare', 'services.'] 3
accessdesc = ["care", "services", "healthcare", "barriers"]
sentences = ["Minority patients often have barriers with their access to
healthcare.", "I am an avid user of Microsoft Access databases", "Rural
patients often cite distance as one of the barriers to access healthcare
services."]
for sentence in sentences:
nummatches = 0
for desc in accessdesc:
sentence = sentence.replace(".","") if "." in sentence else ''
sentence = sentence.replace(",","") if "," in sentence else ''
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()
access_position = sentence.index('access') if "access" in
sentence else 0
desc_position = sentence.index(desc) if desc in sentence else 0
if abs(access_position - desc_position) < 5 :
nummatches = nummatches + 1
else:
nummatches = nummatches + 0
print(sentence, nummatches)
我认为您需要切换循环顺序:
for desc in accessdesc:
for sentence in sentences:
至:
for sentence in sentences:
nummatches = 0 # Resets the count to 0 for each sentence
for desc in accessdesc:
这意味着您可以在进入下一个句子之前检查句子中的每个单词。然后只需将 print(sentence, nummatches)
语句移到第二个循环之外,这样您就可以在每个句子后打印结果。
另外要看的是行 if 'access' and desc in sentence :
。 and
将左边的表达式和右边的表达式组合在一起,并检查它们的计算结果是否为 True
。这意味着它正在检查 access == True
是 True
以及 desc in sentence
。你在这里想要的是检查 access 和 desc 是否都在 sentence 中。我还建议忽略此检查的大小写,因为 'access'
不等于 'Access'
。所以你可以重写这个
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()
所以现在因为您正在检查 desc 在 if 条件的句子中,所以您不必再次检查,就像您在评论中提到的那样。
请注意,如果 access 或其中一个关键字在句子中只出现一次或更少,您的代码才可能按预期工作,因为 sentence.index()
只会找到字符串的第一次出现。它将需要额外的逻辑来处理字符串的多次出现。
编辑
所以你的行替换了标点符号,例如如果句子中不存在该标点符号,sentence = sentence.replace(".","") if "." in sentence else ''
会将句子设置为 ''
。您可以在一行中完成所有替换,然后对照列表而不是句子字符串进行检查。此外,您还需要检查单词是否存在于拆分列表中而不是字符串中,因此它只匹配整个单词。
'it' in 'bit'
>>> True
'it' in ['bit']
>>> False
因此您可以将代码重写为:
for sentence in sentences:
nummatches = 0
words = sentence.replace(".","").replace(",","").lower().split()
# moved this outside of the second loop as the sentence doesn't change through the iterations
# Not changing the sentence variable so can print in it's original form
if 'access' not in words:
continue # No need to proceed if access not in the sentence
for desc in accessdesc:
if desc not in words:
continue # Can use continue to go to the next iteration of the loop
access_position = words.index('access')
desc_position = words.index(desc)
if abs(access_position - desc_position) < 5 :
nummatches += 1
# else statement not required
print(sentence, nummatches) # moved outside of the second loop so it prints after checking through all the words
如前所述,这仅在 'access' 或其中一个关键字仅在句子中出现一次或更少时才有效。如果它们出现不止一次,使用 index() 只会找到第一次出现的地方。
查看 this answer,看看您是否可以在您的代码中使用解决方案。
另请参阅 this answer,了解如何从字符串中去除标点符号。
我有一个关键字列表,我需要知道它们是否在列表中特定句子的单词 "access' in a sentence from a list. At the end, I want to total the number of times a keyword was matched with the word "access" 的 4 个单词内。
当前输出:
['Minority', 'patients', 'often', 'have', 'barrier', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'health', 'services.']
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 1
期望的输出:
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 2
["I, am, an, avid, user, of, Microsoft, Access, databases"] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'healthcare', 'services.'] 3
accessdesc = ["care", "services", "healthcare", "barriers"]
sentences = ["Minority patients often have barriers with their access to
healthcare.", "I am an avid user of Microsoft Access databases", "Rural
patients often cite distance as one of the barriers to access healthcare
services."]
for sentence in sentences:
nummatches = 0
for desc in accessdesc:
sentence = sentence.replace(".","") if "." in sentence else ''
sentence = sentence.replace(",","") if "," in sentence else ''
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()
access_position = sentence.index('access') if "access" in
sentence else 0
desc_position = sentence.index(desc) if desc in sentence else 0
if abs(access_position - desc_position) < 5 :
nummatches = nummatches + 1
else:
nummatches = nummatches + 0
print(sentence, nummatches)
我认为您需要切换循环顺序:
for desc in accessdesc:
for sentence in sentences:
至:
for sentence in sentences:
nummatches = 0 # Resets the count to 0 for each sentence
for desc in accessdesc:
这意味着您可以在进入下一个句子之前检查句子中的每个单词。然后只需将 print(sentence, nummatches)
语句移到第二个循环之外,这样您就可以在每个句子后打印结果。
另外要看的是行 if 'access' and desc in sentence :
。 and
将左边的表达式和右边的表达式组合在一起,并检查它们的计算结果是否为 True
。这意味着它正在检查 access == True
是 True
以及 desc in sentence
。你在这里想要的是检查 access 和 desc 是否都在 sentence 中。我还建议忽略此检查的大小写,因为 'access'
不等于 'Access'
。所以你可以重写这个
if 'access' in sentence.lower() and desc in sentence.lower():
sentence = sentence.lower().split()
所以现在因为您正在检查 desc 在 if 条件的句子中,所以您不必再次检查,就像您在评论中提到的那样。
请注意,如果 access 或其中一个关键字在句子中只出现一次或更少,您的代码才可能按预期工作,因为 sentence.index()
只会找到字符串的第一次出现。它将需要额外的逻辑来处理字符串的多次出现。
编辑
所以你的行替换了标点符号,例如如果句子中不存在该标点符号,sentence = sentence.replace(".","") if "." in sentence else ''
会将句子设置为 ''
。您可以在一行中完成所有替换,然后对照列表而不是句子字符串进行检查。此外,您还需要检查单词是否存在于拆分列表中而不是字符串中,因此它只匹配整个单词。
'it' in 'bit'
>>> True
'it' in ['bit']
>>> False
因此您可以将代码重写为:
for sentence in sentences:
nummatches = 0
words = sentence.replace(".","").replace(",","").lower().split()
# moved this outside of the second loop as the sentence doesn't change through the iterations
# Not changing the sentence variable so can print in it's original form
if 'access' not in words:
continue # No need to proceed if access not in the sentence
for desc in accessdesc:
if desc not in words:
continue # Can use continue to go to the next iteration of the loop
access_position = words.index('access')
desc_position = words.index(desc)
if abs(access_position - desc_position) < 5 :
nummatches += 1
# else statement not required
print(sentence, nummatches) # moved outside of the second loop so it prints after checking through all the words
如前所述,这仅在 'access' 或其中一个关键字仅在句子中出现一次或更少时才有效。如果它们出现不止一次,使用 index() 只会找到第一次出现的地方。 查看 this answer,看看您是否可以在您的代码中使用解决方案。 另请参阅 this answer,了解如何从字符串中去除标点符号。