Python - 拆分并枚举一个字符串,检查 2 个单词是否在字符串中的某个距离内

Python - split and enumerate a string, check if 2 words are within a certain distance within string

我正在开发一个 g 程序,该程序将检查研究报告标题中的某些模式以确定该标题是否相关。通常,如果单词 "access" 和 "care" 彼此相距不超过 4 个单词,则相关。可能有像 "access to care," "patient access," 或 "access to diabetes care."

这样的短语

现在,我已经枚举并拆分了每个字符串,并且过滤掉了其中包含 "access" 和 "care" 的行以及一个数字,但我一直在努力创建一个二进制 "yes/no" 变量,用于判断它们是否在彼此的 4 个字以内。例如:

字符串="Ensuring access to care is important."
相关 = 'yes'

字符串="Ensuring access to baseball tickets is important, but honestly I don't really care."
相关 = 'no'

任何关于如何解决这个问题的想法都将不胜感激。这是我到目前为止所拥有的:

  sentence = 'A priority area for this company is access to medical care 
  and how we address it.'
  sentence = sentence.lower()
  sentence = sentence.split()
  for i, j in enumerate(sentence):

      if 'access' in j:
          x = 'yes'
      else:
          x = 'no'

      if 'care' in j:
          y = 'yes'
      else:
          y = 'no'   

      if x == 'yes' or y == 'yes':
          print(i, j, x, y)

轻松避免所有这些循环:

sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower().split()

### if both in list
if 'access' in sentence and 'care' in sentence :
    ### take indexes
    access_position = sentence.index('access')
    care_position = sentence.index('care')
    ### check the distance between indexes
    if abs( access_position - care_position ) < 4  :
        print("found access and care in less than 4 words")

### result:
found access and care in less than 4 words 

您可以访问索引,以便使用索引进行检查。 将您的代码修改为:

sentence = 'A priority area for this company is access to medical care and how we address it.'

sentence = sentence.lower()
sentence = sentence.split()
access_index = 0
care_index = 0
for i, j in enumerate(sentence):

      if 'access' in j:
          access_index= i


      if 'care' in j:
          care_index = i

if access_index - care_index < 4:
          print ("Less than 4 words")
else:
          print ("More than 4 words")

你可以这样做:

access = sentence.index("access")
care = sentence.index("care")

if abs(care - access) <= 4:
    print("Less than or equal to 4")
else:
    print("More than 4")

当然,请根据您的具体情况修改以上代码。

如果句子中多次出现 "care" 或 "access",目前所有答案只会考虑其中一个,有时会检测不到匹配项。相反,您需要考虑每个单词的所有出现次数:

sentence = "Access to tickets and access to care"
sentence = sentence.lower().split()

access_positions = [i for (i, word) in enumerate(sentence) if word == 'access']
care_positions = [i for (i, word) in enumerate(sentence) if word == 'care']

sentence_is_relevant = any(
    abs(access_i - care_i) <= 4
    for access_i in access_positions
    for care_i in care_positions
)
print("sentence_is_relevant =", sentence_is_relevant)