python grep logcat 中的多个单词
python grep multiple words in logcat
Shell 脚本:
logcat | grep -E "one|two|three"
Python代码:
key_words = [ "one", "two", "three"]
log_lines = os.popen("logcat");
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print log_line
上面的python代码有什么优化吗?
您提出的解决方案打印具有多个关键字的行的频率与它们具有的关键字数量一样多,这可能是您想避免的事情。此外,如果关键字作为另一个词的一部分出现,它也会出现(尽管这与 grep 行为匹配)。
一些解决方案:
import os
key_words = {"one", "two", "three"}
log_lines = ['This has a one and a two', 'Some ones', 'This one has neither, oh it does', 'This does not', 'A three']
# fixing the repetition
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print(log_line)
break
# fixing the repetition and partial matches
for log_line in log_lines:
for word in log_line.split():
if word in key_words:
print(log_line)
break
# single line solution
print([log_line for log_line in log_lines if key_words & set(log_line.split()) != set()])
# single line solution with partial matches
print([log_line for log_line in log_lines if any(key_word in log_line for key_word in key_words)])
您好,您可以使用正则表达式试试这个场景。您还可以根据您的要求更改正则表达式,检查以下示例:
import re
key_words = [ "one", "two", "three"]
regex = "|".join(key_words)
log_lines = open("logcat", 'r')
lines = log_lines.readlines()
print filter(lambda x : re.search(regex,x), lines)
log_lines.close()
要在您的 grep
命令中模拟准确的模式,请执行
import re
pattern = re.compile('|'.join(key_words))
for log_line in log_lines:
if pattern.search(log_line):
print log_line
如果你想允许特殊字符,你必须将它们转义:
pattern = re.compile('|'.join(re.escape(word) for word in key_words))
如您所想,在这种情况下使用正则表达式有点矫枉过正。相反,您可以直接搜索。您可以使用 any
来帮助搜索,因为它会短路。
for log_line in log_lines:
if any(word in log_line for word in key_words):
print log_line
这会针对每个关键字对整行执行线性搜索。如果关键字是实际的单词,您可以提高效率,特别是因为您已经有一组关键字:
for log_line in log_lines:
if keywords.intersection(set(log_line.split()):
print log_line
第一个优化实际上是 break
一旦找到一个匹配项:
key_words = [ "one", "two", "three"]
log_lines = os.popen("logcat");
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print log_line
break # stop looking for keywords if you already found one
一个更具可读性的解决方案是用正则表达式替换关键字循环检查。如果匹配,则打印行:
import re
key_words = [ "one", "two", "three"]
regex = re.compile('|'.join(key_words)) # one|two|three
log_lines = os.popen("logcat");
for log_line in log_lines:
if regex.match(log_line): # returns None if no match, an object if there is a match
print log_line
从性能的角度来看,不确定哪个更快,但一个更具可读性。不过,结果中有一些注意事项。
Shell 脚本:
logcat | grep -E "one|two|three"
Python代码:
key_words = [ "one", "two", "three"]
log_lines = os.popen("logcat");
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print log_line
上面的python代码有什么优化吗?
您提出的解决方案打印具有多个关键字的行的频率与它们具有的关键字数量一样多,这可能是您想避免的事情。此外,如果关键字作为另一个词的一部分出现,它也会出现(尽管这与 grep 行为匹配)。
一些解决方案:
import os
key_words = {"one", "two", "three"}
log_lines = ['This has a one and a two', 'Some ones', 'This one has neither, oh it does', 'This does not', 'A three']
# fixing the repetition
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print(log_line)
break
# fixing the repetition and partial matches
for log_line in log_lines:
for word in log_line.split():
if word in key_words:
print(log_line)
break
# single line solution
print([log_line for log_line in log_lines if key_words & set(log_line.split()) != set()])
# single line solution with partial matches
print([log_line for log_line in log_lines if any(key_word in log_line for key_word in key_words)])
您好,您可以使用正则表达式试试这个场景。您还可以根据您的要求更改正则表达式,检查以下示例:
import re
key_words = [ "one", "two", "three"]
regex = "|".join(key_words)
log_lines = open("logcat", 'r')
lines = log_lines.readlines()
print filter(lambda x : re.search(regex,x), lines)
log_lines.close()
要在您的 grep
命令中模拟准确的模式,请执行
import re
pattern = re.compile('|'.join(key_words))
for log_line in log_lines:
if pattern.search(log_line):
print log_line
如果你想允许特殊字符,你必须将它们转义:
pattern = re.compile('|'.join(re.escape(word) for word in key_words))
如您所想,在这种情况下使用正则表达式有点矫枉过正。相反,您可以直接搜索。您可以使用 any
来帮助搜索,因为它会短路。
for log_line in log_lines:
if any(word in log_line for word in key_words):
print log_line
这会针对每个关键字对整行执行线性搜索。如果关键字是实际的单词,您可以提高效率,特别是因为您已经有一组关键字:
for log_line in log_lines:
if keywords.intersection(set(log_line.split()):
print log_line
第一个优化实际上是 break
一旦找到一个匹配项:
key_words = [ "one", "two", "three"]
log_lines = os.popen("logcat");
for log_line in log_lines:
for keyword in key_words:
if keyword in log_line:
print log_line
break # stop looking for keywords if you already found one
一个更具可读性的解决方案是用正则表达式替换关键字循环检查。如果匹配,则打印行:
import re
key_words = [ "one", "two", "three"]
regex = re.compile('|'.join(key_words)) # one|two|three
log_lines = os.popen("logcat");
for log_line in log_lines:
if regex.match(log_line): # returns None if no match, an object if there is a match
print log_line
从性能的角度来看,不确定哪个更快,但一个更具可读性。不过,结果中有一些注意事项。