尝试从日志文本 (.txt) 文件中搜索不区分大小写的关键字

Question

我有一个对话的日志文件。我想在文件中搜索我指定的某些关键字，但日志文件可能包含我正在搜索的关键字的大写、小写和标题大小写敏感词。

我可以提取关键字小写但无法提取大写或首字母大写版本的大纲。我该如何解决这个问题？

我试过使用

if (words.title() and words.lower()) in line:
     print (searchInLines[i])

但这似乎不起作用。

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if (words.title() and words.lower()) in line:
            print (searchInLines[i])

例如，日志文件包含以下句子：

"Manchester United played Barcelona yesterday, however, the manchester club lost"

我的关键字中有 "manchester"，所以它会选择第二个而不是第一个。

如何识别两者？

提前致谢！

Answer 1

使用正则表达式

例如：

import re

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
    if pattern.search(line):
        print(line)

Answer 2

您可以将行和关键字都转换为大写或小写并进行比较。

keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("test.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if words.upper() in line.upper():
            print(searchInLines[i])

Answer 3

我不完全确定您要做什么，但我认为它正在过滤掉包含 keywords 中的一个词的消息（行）。这是一个简单的方法：

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for line in searchInLines:
    for keyword in keywords:
        if keyword in line.lower():
            print(line)

Answer 4

(1) 嗯，你的话是小写的，所以"words.lower()"没有效果。 (2) 如果你没有 "Manchester" AND "manchester"，你的例句就不会被找到，因为你使用的是 "and" 逻辑。 (3) 你想要的，我相信是："if words in line.lower():"

Answer 5

首先，使用上下文管理器时不需要 f.close()。

至于解决方案，我建议您在这种情况下使用正则表达式

import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

for line in searchInLines:
    # if we get a match
    if re.search(pattern, line.lower()):
        print(line)

尝试从日志文本 (.txt) 文件中搜索不区分大小写的关键字

Trying to search case insensitive keywords from a log text (.txt) file

python

keyword

keyword-search