如何提取更长的字符串并忽略给定句子中的子字符串？

Question

我有一个字符串列表和一个句子如下：

list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]

sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"

我想将 list_of_strings 匹配到 sentence 并将输出打印为仅较长的短语（忽略子字符串）：

skin allergy
hair loss

我写了这个：但这会提取匹配的所有内容。

Answer 1

使用正则表达式。

例如：

import re

list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]
sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"
pattern = re.compile(r"(\b" + "|".join(list_of_strings) + r")\b")

m = pattern.findall(sentence)
print(m)

输出：

['skin allergy', 'hair loss']

如何提取更长的字符串并忽略给定句子中的子字符串？

How to extract longer strings and ignore sub-strings from given sentence?

string-matching

python-3.x