Python - 拆分并枚举一个字符串,检查 2 个单词是否在字符串中的某个距离内
Python - split and enumerate a string, check if 2 words are within a certain distance within string
我正在开发一个 g 程序,该程序将检查研究报告标题中的某些模式以确定该标题是否相关。通常,如果单词 "access" 和 "care" 彼此相距不超过 4 个单词,则相关。可能有像 "access to care," "patient access," 或 "access to diabetes care."
这样的短语
现在,我已经枚举并拆分了每个字符串,并且过滤掉了其中包含 "access" 和 "care" 的行以及一个数字,但我一直在努力创建一个二进制 "yes/no" 变量,用于判断它们是否在彼此的 4 个字以内。例如:
字符串="Ensuring access to care is important."
相关 = 'yes'
字符串="Ensuring access to baseball tickets is important, but honestly I don't really care."
相关 = 'no'
任何关于如何解决这个问题的想法都将不胜感激。这是我到目前为止所拥有的:
sentence = 'A priority area for this company is access to medical care
and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
for i, j in enumerate(sentence):
if 'access' in j:
x = 'yes'
else:
x = 'no'
if 'care' in j:
y = 'yes'
else:
y = 'no'
if x == 'yes' or y == 'yes':
print(i, j, x, y)
轻松避免所有这些循环:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower().split()
### if both in list
if 'access' in sentence and 'care' in sentence :
### take indexes
access_position = sentence.index('access')
care_position = sentence.index('care')
### check the distance between indexes
if abs( access_position - care_position ) < 4 :
print("found access and care in less than 4 words")
### result:
found access and care in less than 4 words
您可以访问索引,以便使用索引进行检查。
将您的代码修改为:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
access_index = 0
care_index = 0
for i, j in enumerate(sentence):
if 'access' in j:
access_index= i
if 'care' in j:
care_index = i
if access_index - care_index < 4:
print ("Less than 4 words")
else:
print ("More than 4 words")
你可以这样做:
access = sentence.index("access")
care = sentence.index("care")
if abs(care - access) <= 4:
print("Less than or equal to 4")
else:
print("More than 4")
当然,请根据您的具体情况修改以上代码。
如果句子中多次出现 "care" 或 "access",目前所有答案只会考虑其中一个,有时会检测不到匹配项。相反,您需要考虑每个单词的所有出现次数:
sentence = "Access to tickets and access to care"
sentence = sentence.lower().split()
access_positions = [i for (i, word) in enumerate(sentence) if word == 'access']
care_positions = [i for (i, word) in enumerate(sentence) if word == 'care']
sentence_is_relevant = any(
abs(access_i - care_i) <= 4
for access_i in access_positions
for care_i in care_positions
)
print("sentence_is_relevant =", sentence_is_relevant)
我正在开发一个 g 程序,该程序将检查研究报告标题中的某些模式以确定该标题是否相关。通常,如果单词 "access" 和 "care" 彼此相距不超过 4 个单词,则相关。可能有像 "access to care," "patient access," 或 "access to diabetes care."
这样的短语现在,我已经枚举并拆分了每个字符串,并且过滤掉了其中包含 "access" 和 "care" 的行以及一个数字,但我一直在努力创建一个二进制 "yes/no" 变量,用于判断它们是否在彼此的 4 个字以内。例如:
字符串="Ensuring access to care is important."
相关 = 'yes'
字符串="Ensuring access to baseball tickets is important, but honestly I don't really care."
相关 = 'no'
任何关于如何解决这个问题的想法都将不胜感激。这是我到目前为止所拥有的:
sentence = 'A priority area for this company is access to medical care
and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
for i, j in enumerate(sentence):
if 'access' in j:
x = 'yes'
else:
x = 'no'
if 'care' in j:
y = 'yes'
else:
y = 'no'
if x == 'yes' or y == 'yes':
print(i, j, x, y)
轻松避免所有这些循环:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower().split()
### if both in list
if 'access' in sentence and 'care' in sentence :
### take indexes
access_position = sentence.index('access')
care_position = sentence.index('care')
### check the distance between indexes
if abs( access_position - care_position ) < 4 :
print("found access and care in less than 4 words")
### result:
found access and care in less than 4 words
您可以访问索引,以便使用索引进行检查。 将您的代码修改为:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
access_index = 0
care_index = 0
for i, j in enumerate(sentence):
if 'access' in j:
access_index= i
if 'care' in j:
care_index = i
if access_index - care_index < 4:
print ("Less than 4 words")
else:
print ("More than 4 words")
你可以这样做:
access = sentence.index("access")
care = sentence.index("care")
if abs(care - access) <= 4:
print("Less than or equal to 4")
else:
print("More than 4")
当然,请根据您的具体情况修改以上代码。
如果句子中多次出现 "care" 或 "access",目前所有答案只会考虑其中一个,有时会检测不到匹配项。相反,您需要考虑每个单词的所有出现次数:
sentence = "Access to tickets and access to care"
sentence = sentence.lower().split()
access_positions = [i for (i, word) in enumerate(sentence) if word == 'access']
care_positions = [i for (i, word) in enumerate(sentence) if word == 'care']
sentence_is_relevant = any(
abs(access_i - care_i) <= 4
for access_i in access_positions
for care_i in care_positions
)
print("sentence_is_relevant =", sentence_is_relevant)