从文本文件中读取数据并将其存储在 python 中的数组中
Reading in data from a text file and storing it in an array in python
我正在尝试从文本文件中逐行读取数据并将其存储在二维数组中,以便我可以在稍后阶段进一步处理它。
每次找到字符串 'EOE' 时,我都想移至新行并继续从文本文件中逐行读取条目。
我似乎无法声明二维字符串数组或成功读入值。我是来自 C 的 python 新手,所以我的语法和一般 python 理解不是很好。
rf = open('data_small.txt', 'r')
lines = rf.readlines()
rf.close()
i = 0
j = 0
line_array = np.array((200, 200))
for line in lines:
line=line.strip()
print(line)
line_array[i][j] = line
if line == 'EOE':
i+=1
j+=1
rf.close()
line_array
文本文件看起来像这样:
-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----
Entry1=Processing
Entry2=50.87.78
Entry3=Instance.Test.ID=91
EOE
-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----
我希望数组字符串数组看起来像这样,行和列可以调换,但总体思路是一行或一列代表一个 EOE 条目:
array = [
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE'],
['-----', 'Entry1=Processing', 'Entry2=50.87.78', 'Entry3=Instance.Test.ID=91', 'EOE'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE']
]
这是一种方法。
例如:
res = [[]]
with open(filename) as infile:
for line in infile: #Iterate each line
line = line.strip() #strip new line
if line == 'EOE': #check for `EOE`
res.append([]) #Add new sub-list
else:
res[-1].append(line) #Append content to previous sub-list
print(res)
输出:
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----']]
这是一个"pythonic"方法:
>>> with open('data_small.txt') as input_file:
>>> contents = input_file.read()
>>> contents
'-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67\nEOE\n-----\nEntry1=Processing\nEntry2=50.87.78\nEntry3=Instance.Test.ID=91\nEOE\n-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67\nEOE\n-----'
第一步是按\nEOE\n
拆分:
>>> contents = contents.split('\nEOE\n')
>>> contents
['-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67',
'-----\nEntry1=Processing\nEntry2=50.87.78\nEntry3=Instance.Test.ID=91',
'-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67',
'-----']
接下来是将列表中的每个元素拆分为\n
:
>>> contents = [content.split('\n') for content in contents]
>>> contents
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----']]
这将为您提供所需的输出。如果你不想要最后一个元素,就这样做:
>>> contents = contents[:-1]
>>> contents
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67']]
PS:确保只使用 with
语句打开和读取文件,然后在 with
语句之外进行计算。
我正在尝试从文本文件中逐行读取数据并将其存储在二维数组中,以便我可以在稍后阶段进一步处理它。
每次找到字符串 'EOE' 时,我都想移至新行并继续从文本文件中逐行读取条目。
我似乎无法声明二维字符串数组或成功读入值。我是来自 C 的 python 新手,所以我的语法和一般 python 理解不是很好。
rf = open('data_small.txt', 'r')
lines = rf.readlines()
rf.close()
i = 0
j = 0
line_array = np.array((200, 200))
for line in lines:
line=line.strip()
print(line)
line_array[i][j] = line
if line == 'EOE':
i+=1
j+=1
rf.close()
line_array
文本文件看起来像这样:
-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----
Entry1=Processing
Entry2=50.87.78
Entry3=Instance.Test.ID=91
EOE
-----
Entry1=50
Entry2=SomeText
Entry3=Instance.Test.ID=67
EOE
-----
我希望数组字符串数组看起来像这样,行和列可以调换,但总体思路是一行或一列代表一个 EOE 条目:
array = [
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE'],
['-----', 'Entry1=Processing', 'Entry2=50.87.78', 'Entry3=Instance.Test.ID=91', 'EOE'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67', 'EOE']
]
这是一种方法。
例如:
res = [[]]
with open(filename) as infile:
for line in infile: #Iterate each line
line = line.strip() #strip new line
if line == 'EOE': #check for `EOE`
res.append([]) #Add new sub-list
else:
res[-1].append(line) #Append content to previous sub-list
print(res)
输出:
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----']]
这是一个"pythonic"方法:
>>> with open('data_small.txt') as input_file:
>>> contents = input_file.read()
>>> contents
'-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67\nEOE\n-----\nEntry1=Processing\nEntry2=50.87.78\nEntry3=Instance.Test.ID=91\nEOE\n-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67\nEOE\n-----'
第一步是按\nEOE\n
拆分:
>>> contents = contents.split('\nEOE\n')
>>> contents
['-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67',
'-----\nEntry1=Processing\nEntry2=50.87.78\nEntry3=Instance.Test.ID=91',
'-----\nEntry1=50\nEntry2=SomeText\nEntry3=Instance.Test.ID=67',
'-----']
接下来是将列表中的每个元素拆分为\n
:
>>> contents = [content.split('\n') for content in contents]
>>> contents
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----']]
这将为您提供所需的输出。如果你不想要最后一个元素,就这样做:
>>> contents = contents[:-1]
>>> contents
[['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67'],
['-----',
'Entry1=Processing',
'Entry2=50.87.78',
'Entry3=Instance.Test.ID=91'],
['-----', 'Entry1=50', 'Entry2=SomeText', 'Entry3=Instance.Test.ID=67']]
PS:确保只使用 with
语句打开和读取文件,然后在 with
语句之外进行计算。