在 Python 中循环遍历 100 个文本文件

Looping through 100 text files in Python

我的python代码如下:

#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime

#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)

#Reading the data line by line
with open('serverLogs.log-2020-04-30-01') as f:
    lines = f.readlines()
    #print(lines)
    for line in lines:
        parts  = line.split('OD_MAKER_DATE=') 
        df_ = df_.append(parts)

我有很多文本文件,其中文本文件名的最后两位数字发生变化,它们的范围从 01 到 100,即 'serverLogs.log-2020-04-30-01'、'serverLogs.log-2020-04-30-02'...'serverLogs.log-2020-04-30-100'.

我如何在现有代码的开头创建一个 for 循环来循环遍历 100 个文件并将各个行附加到数据帧 df_ 中,而不是一次加载一个文件?我对python.

不是很熟悉
for idx in range(101):
  fname = ("serverLogs.log-2020-04-30-%d" % idx)
  with open(fname) as f:
    ...

不确定这是否是循环读取文件的最有效方式。但我能理解的是,对于前 9 个文件,您需要附加一个 0。此代码可能会解决您生成所需名称的问题:

file_count = 100 # can change it to any value
base_name = 'serverLogs.log-2020-04-30-{}'

for i in range(file_count):
    file_name = base_name.format("%.2d" % (i+1))

然后,您可以循环读取文件中的数据并以与您现在相同的方式追加:

#Reading the data line by line
with open(file_name) as f:
    lines = f.readlines()

    for line in lines:
        parts  = line.split('OD_MAKER_DATE=') 
        df_ = df_.append(parts)

您可以使用字符串格式并循环遍历数字 1-100 以读取所有 100 个文件

import re
import pandas as pd
import numpy as np
import datetime


columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)

for i in range(101):
    with open('serverLogs.log-2020-04-30-{}'.format("%.2d" % i)) as f:
        lines = f.readlines()
        #print(lines)


for line in lines:
        parts  = line.split('OD_MAKER_DATE=') 
        df_ = df_.append(parts)
#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime

#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)

#Reading the data line by line
file_name = 'serverLogs.log-2020-04-30-{}'
for i in range(101):
    file_name = file_name.format("%.2d" % (i+1))
    with open(file_name) as f:
        lines = f.readlines()
        #print(lines)
        for line in lines:
            parts  = line.split('OD_MAKER_DATE=')
            df_ = df_.append(parts)