在 Python 中循环遍历 100 个文本文件
Looping through 100 text files in Python
我的python代码如下:
#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime
#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
#Reading the data line by line
with open('serverLogs.log-2020-04-30-01') as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
我有很多文本文件,其中文本文件名的最后两位数字发生变化,它们的范围从 01 到 100,即 'serverLogs.log-2020-04-30-01'、'serverLogs.log-2020-04-30-02'...'serverLogs.log-2020-04-30-100'.
我如何在现有代码的开头创建一个 for 循环来循环遍历 100 个文件并将各个行附加到数据帧 df_ 中,而不是一次加载一个文件?我对python.
不是很熟悉
for idx in range(101):
fname = ("serverLogs.log-2020-04-30-%d" % idx)
with open(fname) as f:
...
不确定这是否是循环读取文件的最有效方式。但我能理解的是,对于前 9 个文件,您需要附加一个 0。此代码可能会解决您生成所需名称的问题:
file_count = 100 # can change it to any value
base_name = 'serverLogs.log-2020-04-30-{}'
for i in range(file_count):
file_name = base_name.format("%.2d" % (i+1))
然后,您可以循环读取文件中的数据并以与您现在相同的方式追加:
#Reading the data line by line
with open(file_name) as f:
lines = f.readlines()
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
您可以使用字符串格式并循环遍历数字 1-100 以读取所有 100 个文件
import re
import pandas as pd
import numpy as np
import datetime
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
for i in range(101):
with open('serverLogs.log-2020-04-30-{}'.format("%.2d" % i)) as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime
#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
#Reading the data line by line
file_name = 'serverLogs.log-2020-04-30-{}'
for i in range(101):
file_name = file_name.format("%.2d" % (i+1))
with open(file_name) as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
我的python代码如下:
#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime
#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
#Reading the data line by line
with open('serverLogs.log-2020-04-30-01') as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
我有很多文本文件,其中文本文件名的最后两位数字发生变化,它们的范围从 01 到 100,即 'serverLogs.log-2020-04-30-01'、'serverLogs.log-2020-04-30-02'...'serverLogs.log-2020-04-30-100'.
我如何在现有代码的开头创建一个 for 循环来循环遍历 100 个文件并将各个行附加到数据帧 df_ 中,而不是一次加载一个文件?我对python.
不是很熟悉for idx in range(101):
fname = ("serverLogs.log-2020-04-30-%d" % idx)
with open(fname) as f:
...
不确定这是否是循环读取文件的最有效方式。但我能理解的是,对于前 9 个文件,您需要附加一个 0。此代码可能会解决您生成所需名称的问题:
file_count = 100 # can change it to any value
base_name = 'serverLogs.log-2020-04-30-{}'
for i in range(file_count):
file_name = base_name.format("%.2d" % (i+1))
然后,您可以循环读取文件中的数据并以与您现在相同的方式追加:
#Reading the data line by line
with open(file_name) as f:
lines = f.readlines()
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
您可以使用字符串格式并循环遍历数字 1-100 以读取所有 100 个文件
import re
import pandas as pd
import numpy as np
import datetime
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
for i in range(101):
with open('serverLogs.log-2020-04-30-{}'.format("%.2d" % i)) as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)
#Loading libraries
import re
import pandas as pd
import numpy as np
import datetime
#Creating an empty dataframe
columns = ['A']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0)
#Reading the data line by line
file_name = 'serverLogs.log-2020-04-30-{}'
for i in range(101):
file_name = file_name.format("%.2d" % (i+1))
with open(file_name) as f:
lines = f.readlines()
#print(lines)
for line in lines:
parts = line.split('OD_MAKER_DATE=')
df_ = df_.append(parts)