使用 python 在列表中搜索多个词

Search for multiple words in a list using python

我目前正在处理我的第一个 python 项目。目标是能够通过从我生成的单词列表中搜索和打印包含特定单词的句子来总结网页信息。例如,以下(大)列表包含 'business key terms' 我在商业网站上使用 cewl 生成的;

business_list = ['business', 'marketing', 'market', 'price', 'management', 'terms', 'product', 'research', 'organisation', 'external', 'operations', 'organisations', 'tools', 'people', 'sales', 'growth', 'quality', 'resources', 'revenue', 'account', 'value', 'process', 'level', 'stakeholders', 'structure', 'company', 'accounts', 'development', 'personal', 'corporate', 'functions', 'products', 'activity', 'demand', 'share', 'services', 'communication', 'period', 'example', 'total', 'decision', 'companies', 'service', 'working', 'businesses', 'amount', 'number', 'scale', 'means', 'needs', 'customers', 'competition', 'brand', 'image', 'strategies', 'consumer', 'based', 'policy', 'increase', 'could', 'industry', 'manufacture', 'assets', 'social', 'sector', 'strategy', 'markets', 'information', 'benefits', 'selling', 'decisions', 'performance', 'training', 'customer', 'purchase', 'person', 'rates', 'examples', 'strategic', 'determine', 'matrix', 'focus', 'goals', 'individual', 'potential', 'managers', 'important', 'achieve', 'influence', 'impact', 'definition', 'employees', 'knowledge', 'economies', 'skills', 'buying', 'competitive', 'specific', 'ability', 'provide', 'activities', 'improve', 'productivity', 'action', 'power', 'capital', 'related', 'target', 'critical', 'stage', 'opportunities', 'section', 'system', 'review', 'effective', 'stock', 'technology', 'relationship', 'plans', 'opportunity', 'leader', 'niche', 'success', 'stages', 'manager', 'venture', 'trends', 'media', 'state', 'negotiation', 'network', 'successful', 'teams', 'offer', 'generate', 'contract', 'systems', 'manage', 'relevant', 'published', 'criteria', 'sellers', 'offers', 'seller', 'campaigns', 'economy', 'buyers', 'everyone', 'medium', 'valuable', 'model', 'enterprise', 'partnerships', 'buyer', 'compensation', 'partners', 'leaders', 'build', 'commission', 'engage', 'clients', 'partner', 'quota', 'focused', 'modern', 'career', 'executive', 'qualified', 'tactics', 'supplier', 'investors', 'entrepreneurs', 'financing', 'commercial', 'finances', 'entrepreneurial', 'entrepreneur', 'reports', 'interview', 'ansoff']

下面的程序允许我从我指定的 URL 中复制所有文本并将其组织到一个列表中,其中元素由句子分隔;

from bs4 import BeautifulSoup
import urllib.request as ul

url = input("Enter URL: ")
html = ul.urlopen(url).read()

soup = BeautifulSoup(html, 'lxml')
for script in soup(["script", "style"]):
    script.decompose()
strips = list(soup.stripped_strings)
# Joining list to form single text
text = " ".join(strips)
text = text.lower()
# Replacing substitutes of '.'
for i in range(len(text)):
    if text[i] in "?!:;":
        text = text.replace(text[i], ".")
# Splitting text by sentences
sentences = text.split(".")

我目前的 objective 是让程序打印包含上述一个(或多个)关键术语的所有句子,但是我一次只成功打印了一个单词;

# Word to search for in the text
word_search = input("Enter word: ")
word_search = word_search.lower()
sentences_with_word = []
for x in sentences:
               if x.count(word_search)>0:
                          sentences_with_word.append(x)
# Separating sentences into separate lines
sentence_text = "\n\n".join(sentences_with_word)
print(sentence_text)

有人可以演示如何一次对整个列表实现这一点吗?谢谢。

编辑

根据 MachineLearner, here is an example of the output for a single word. If I use wikipedia's page on marketing 对 URL 的建议,并选择单词“marketing”作为 'word_search' 的输入,这是一个生成的输出片段(尽管整个输出差不多有 600 行长);

marketing mix the marketing mix is a foundational tool used to guide decision making in marketing

 the marketing mix represents the basic tools which marketers can use to bring their products or services to market

 they are the foundation of managerial marketing and the marketing plan typically devotes a section to the marketing mix

 the 4ps [ edit ] the traditional marketing mix refers to four broad levels of marketing decision

使用双循环检查列表中包含的多个单词:

for sentence in sentences:
  for word in words:
    if sentence.count(word) > 0:
      output.append(sentence)
      # Do not forget to break the second loop, else
      # you'll end up with multiple times the same sentence
      # in the output array if the sentence contains 
      # multiple words
      break