使用 Python 读取多个文件时，如何搜索重复出现的错误字符串？

Question

我刚开始玩 Python，我正在尝试对我的环境进行一些测试......我的想法是尝试创建一个简单的脚本来查找给定的时间段。

基本上我想在我的每日日志中计算服务器发生故障的次数，如果故障在给定时间段（假设 30 天）内发生超过给定次数（假设 10 次））我应该能够在日志上发出警报，但是，我不是想只计算 30 天间隔内错误的重复次数......我真正想做的是计算错误的次数错误发生了，恢复了，然后又发生了，这样我就可以避免在问题持续几天的情况下多次报告。

例如，假设：

file_2016_Oct_01.txt@hostname@YES
file_2016_Oct_02.txt@hostname@YES
file_2016_Oct_03.txt@hostname@NO
file_2016_Oct_04.txt@hostname@NO
file_2016_Oct_05.txt@hostname@YES
file_2016_Oct_06.txt@hostname@NO
file_2016_Oct_07.txt@hostname@NO

鉴于上述情况，我希望脚本将其解释为 2 次失败而不是 4 次，因为有时服务器在恢复前可能会呈现相同的状态数天，我希望能够确定问题的重复发生而不是仅仅计算失败的总数。

郑重声明，这就是我浏览文件的方式：

# Creates an empty list
history_list = []

# Function to find the files from the last 30 days

def f_findfiles():
    # First define the cut-off day, which means the last number 
    # of days which the scritp will consider for the analysis
    cut_off_day = datetime.datetime.now() - datetime.timedelta(days=30)

    # We'll now loop through all history files from the last 30 days
    for file in glob.iglob("/opt/hc/*.txt"):
        filetime = datetime.datetime.fromtimestamp(os.path.getmtime(file))
        if filetime > cut_off_day:
            history_list.append(file)

# Just included the function below to show how I'm going 
# through the files, this is where I got stuck...

def f_openfiles(arg):
    for file in arg:
        with open(file, "r") as file:
            for line in file:
                clean_line = line.strip().split("@")

# Main function
def main():
    f_findfiles()
    f_openfiles(history_list)

我正在使用 'with' 打开文件并读取 'for' 中所有文件的所有行，但我不确定如何浏览数据以比较一个文件与旧文件相关的值。

我试过将所有数据放入字典、列表或只是枚举和比较，但我在所有这些方法上都失败了:-(

关于最佳方法的任何提示？谢谢！

Answer 1

我最好使用 shell 实用程序（即 uniq）处理此类问题，但是，只要您喜欢使用 python:

用最少的努力，您可以处理它创建适当的 dict 对象，其中 stings（如 'file_2016_Oct_01.txt@hostname@YES'）是键。遍历日志，您将检查字典中是否存在相应的键（if 'file_2016_Oct_01.txt@hostname@YES' in my_log_dict），然后适当地分配或增加字典值。

一个简短的示例：

data_log = {}

lookup_string = 'foobar'
if lookup_string in data_log:
    data_log[lookup_string] += 1
else:
    data_log[lookup_string] = 1

或者（单行，但在 python 大多数时候它看起来很丑，我已经对其进行了编辑以使用换行符以显示）：

data_log[lookup_string] = data_log[lookup_string] + 1 \
    if lookup_string in data_log \
    else 1

使用 Python 读取多个文件时，如何搜索重复出现的错误字符串？

While reading multiple files with Python, how can I search for the recurrence of an error string?

python

scripting

string-iteration