python 文本文件到行和列
python text file to rows and columns
所以我已经尝试了一段时间,似乎遇到了障碍,需要帮助。
我有几个文本文件。不用全部写出来,这里有一个例子:
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
等等等等。有些是这样的,它每 6 行开始一个新的统计文件,有些文本文件有它,所以每 10 行就有一个新的统计 sheet。
我的目标是让每次统计 sheet 结束时,把它放到一行和一列中。我认为它被称为转置,在 spreadsheets 术语中,但我知道我做错了什么。或者即使这样说是正确的..
例如,我希望文件在完成后看起来像这样。
Year | Name | Stamina | Agility | Str | Res
2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30
我试过 Numpy,Pandas,我知道我做错了什么,老实说不知道要搜索什么才能找到正确的答案。
如果我能得到任何帮助,我将不胜感激,这些文件非常大,我希望能够具体说明我需要统计数据 sheet 来填充的列数。
如果您能提供帮助,在此先感谢您。
试试这个
您可以将 txt 文件读取为 csv
file=pd.read_csv('filename.txt',sep=" ",header=None,error_bad_lines=False)
or
file =pd.read_fwf('filename.txt')
你可以逐行读取文件,将每一行添加到输出行,遇到空行写入该输出行,然后最后一次写入,以防没有最后的空行在文件的末尾。
我写了一个小程序,它将您的输入作为 test.txt
并将其写入 test_out.txt
:
2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30
2020 | Mondo Silo | Stamina: 23 | Agility: 13 | Strength: 10.5% | Resistances: 20-21-20
代码如下:
with open("test.txt", "r") as infile:
with open("test_out.txt", "w") as outfile:
columns = ""
for line in infile:
line = line.replace("\n", "") # remove newline from end of line
print(line)
if line == "" and len(columns) > 0: # if the line is a blank line, and we have columns to write, split into a new row
outfile.write(columns + "\n")
columns = "" # reset row
else:
if len(columns) > 0: # Put a seperator before every column except for the first
columns += " | "
columns += line
if len(columns) > 0: # write final row
outfile.write(columns + "\n")
您可以试试这个来获取所需的数据框:
with open(r'test1.txt','r') as file:
data=file.read().split('\n\n')
data=[i.split('\n') for i in data]
df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res'])
print(df)
输出:
Year Name ... Str Res
0 2020 Grum Grum ... Strength: 20.5% Resistances: 20-21-30
1 2020 Mondo Silo ... Strength: 10.5% Resistances: 20-21-20
2 2020 Grum Grum ... Strength: 20.5% Resistances: 20-21-30
3 2020 Mondo Silo ... Strength: 10.5% Resistances: 20-21-20
并写入具有不同行数且具有相同结构的 .txt
文件列表的数据帧,您可以尝试:
选项 1
import pandas as pd
files=['test1.txt','test2.txt'] #list of files
df=pd.DataFrame(columns=['Year','Name','Stamina','Agility','Str','Res']) #create the dataframe
for file in files: #we open each file
with open(r'path_of_files'+file,'r') as file_r:
data=file_r.read().strip().split('\n\n')
data=[i.split('\n') for i in data if i!=''] #get the rows
print(data)
s = pd.DataFrame(data, columns=df.columns)
df =pd.concat([df, s], ignore_index=True) #we append the new rows to the dataframe
print(df)
df.to_csv(r'test3.txt', sep='|', index=False) #write the final dataframe to the output file('test3.txt'), with '|' as separator
选项 2
import pandas as pd
files=['test1.txt','test2.txt'] #list of files
for file in files: #we open each file
with open(r'path_of_files'+file,'r') as file_r, open(r'test3.txt', 'a') as fout:
data=file_r.read().strip().split('\n\n')
data=[i.split('\n') for i in data if i!='']
df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res']) #create a dataframe with the data of the current file
if files.index(file)==0:
fout.write(df.to_string( index = False)) #we let header=true to the first iteration to write the columns, and also write the data
else:
fout.write(df.to_string(header = False, index = False)) #we write the dataframe without the index and the columns names
fout.write('\n') #a newline to place correctly the next rows
示例
使用下面的一些虚拟文件 (test1.txt,test2.txt
),您可以看到带有两个选项的结果 (test3.txt
):
test1.txt
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
test2.txt
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
test3.txt(输出文件)选项 1
Year|Name|Stamina|Agility|Str|Res
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
test3.txt(输出文件)带选项 2
Year Name Stamina Agility Str Res
2020 Grum Grum Stamina: 20 Agility: 23 Strength: 20.5% Resistances: 20-21-30
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Grum Grum Stamina: 20 Agility: 23 Strength: 20.5% Resistances: 20-21-30
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
import pandas as pd
t = open('filepath', 'r').read()
data = [[a for a in x.split('\n') if a] for x in t.split('\n\n')]
datadf = pd.DataFrame(data)
print(datadf)
- 此选项会在将数据加载到数据帧之前修复数据格式。
- 这将以标准表格格式显示数据作为一个选项,因为已经有其他好的答案可以将数据转换为请求的格式。
- Headers 位于每列的顶部,数据位于 header.
下方的每行
- 从信息存储和检索的角度来看,这是呈现和存储数据的标准方式。
- 以标准方式存储数据可以更轻松地检索和使用其他工具可视化数据。
[0::6]
:列表切片,从 0 开始获取列表中的每第 6 个值
[1::6]
:列表切片从 1 开始获取列表中的每第 6 个值
- 使用
collections.defaultdict
获取列表元素并将它们转换成字典。
- 使用
sep=','
或 sep='|'
将数据帧保存到 csv
- 用
df = pd.read_csv('characters.csv', sep='|')
读回文件
import pandas as pd
from collections import defaultdict as dd
# read the file
with open('test.txt', 'r') as f:
# read the text in; results in a list of strings
text_list = [r.strip() for r in f.readlines() if r.strip()] # remove all new lines and empty rows
# add Year: in front of each year number
years = text_list[0::6] # create a list of each year
text_list[0::6] = [f'Year: {f}' for f in years]
# add Name: in front of each name
names = text_list[1::6] # create a list of each name
text_list[1::6] = [f'Name: {f}' for f in names]
# split each string at ': '
text_list = [x.split(': ') for x in text_list]
# create a dict for each value
data = dd(list)
for text in text_list:
data[text[0]].append(text[1])
# load data into a dataframe
df = pd.DataFrame(data)
# display df
Year Name Stamina Agility Strength Resistances
0 2020 Grum Grum 20 23 20.5% 20-21-30
1 2020 Mondo Silo 23 13 10.5% 20-21-20
# save
df.to_csv('characters.csv', sep='|', index=False)
# file output
year|name|Stamina|Agility|Strength|Resistances
2020|Grum Grum|20|23|20.5%|20-21-30
2020|Mondo Silo|23|13|10.5%|20-21-20
如果您将文本文件保持为相同格式并在各组之间换行,这应该适合您:
import xlsxwriter
items = []
# parse through .txt file
with open('file.txt', 'r') as r:
text = list(r.read().splitlines())
while text.count('') != 0:
text.remove('')
x = 0
while True:
items.append([])
for num in range(0, 6):
items[x].append(text[0])
text.remove(text[0])
x += 1
if len(text) == 0:
break
print(items)
# Starting worksheet
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
row = 0
# Writing column titles
titles = ['Year', 'Name', 'Stamina', 'Agility', 'Str', 'Res']
for i in range(0, 6):
worksheet.write(row, i, titles[i])
# fills in data from parsed .txt file
x, row = 0, 1
while True:
for i in range(0, 6):
cur = items[x][0]
worksheet.write(row, i, cur)
items[x].remove(cur)
print(items)
row += 1
x += 1
print('hi')
if len(items) == x:
break
# Closes workbook
workbook.close()
所以我已经尝试了一段时间,似乎遇到了障碍,需要帮助。
我有几个文本文件。不用全部写出来,这里有一个例子:
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
等等等等。有些是这样的,它每 6 行开始一个新的统计文件,有些文本文件有它,所以每 10 行就有一个新的统计 sheet。
我的目标是让每次统计 sheet 结束时,把它放到一行和一列中。我认为它被称为转置,在 spreadsheets 术语中,但我知道我做错了什么。或者即使这样说是正确的..
例如,我希望文件在完成后看起来像这样。
Year | Name | Stamina | Agility | Str | Res
2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30
我试过 Numpy,Pandas,我知道我做错了什么,老实说不知道要搜索什么才能找到正确的答案。
如果我能得到任何帮助,我将不胜感激,这些文件非常大,我希望能够具体说明我需要统计数据 sheet 来填充的列数。
如果您能提供帮助,在此先感谢您。
试试这个
您可以将 txt 文件读取为 csv
file=pd.read_csv('filename.txt',sep=" ",header=None,error_bad_lines=False)
or
file =pd.read_fwf('filename.txt')
你可以逐行读取文件,将每一行添加到输出行,遇到空行写入该输出行,然后最后一次写入,以防没有最后的空行在文件的末尾。
我写了一个小程序,它将您的输入作为 test.txt
并将其写入 test_out.txt
:
2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30
2020 | Mondo Silo | Stamina: 23 | Agility: 13 | Strength: 10.5% | Resistances: 20-21-20
代码如下:
with open("test.txt", "r") as infile:
with open("test_out.txt", "w") as outfile:
columns = ""
for line in infile:
line = line.replace("\n", "") # remove newline from end of line
print(line)
if line == "" and len(columns) > 0: # if the line is a blank line, and we have columns to write, split into a new row
outfile.write(columns + "\n")
columns = "" # reset row
else:
if len(columns) > 0: # Put a seperator before every column except for the first
columns += " | "
columns += line
if len(columns) > 0: # write final row
outfile.write(columns + "\n")
您可以试试这个来获取所需的数据框:
with open(r'test1.txt','r') as file:
data=file.read().split('\n\n')
data=[i.split('\n') for i in data]
df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res'])
print(df)
输出:
Year Name ... Str Res
0 2020 Grum Grum ... Strength: 20.5% Resistances: 20-21-30
1 2020 Mondo Silo ... Strength: 10.5% Resistances: 20-21-20
2 2020 Grum Grum ... Strength: 20.5% Resistances: 20-21-30
3 2020 Mondo Silo ... Strength: 10.5% Resistances: 20-21-20
并写入具有不同行数且具有相同结构的 .txt
文件列表的数据帧,您可以尝试:
选项 1
import pandas as pd
files=['test1.txt','test2.txt'] #list of files
df=pd.DataFrame(columns=['Year','Name','Stamina','Agility','Str','Res']) #create the dataframe
for file in files: #we open each file
with open(r'path_of_files'+file,'r') as file_r:
data=file_r.read().strip().split('\n\n')
data=[i.split('\n') for i in data if i!=''] #get the rows
print(data)
s = pd.DataFrame(data, columns=df.columns)
df =pd.concat([df, s], ignore_index=True) #we append the new rows to the dataframe
print(df)
df.to_csv(r'test3.txt', sep='|', index=False) #write the final dataframe to the output file('test3.txt'), with '|' as separator
选项 2
import pandas as pd
files=['test1.txt','test2.txt'] #list of files
for file in files: #we open each file
with open(r'path_of_files'+file,'r') as file_r, open(r'test3.txt', 'a') as fout:
data=file_r.read().strip().split('\n\n')
data=[i.split('\n') for i in data if i!='']
df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res']) #create a dataframe with the data of the current file
if files.index(file)==0:
fout.write(df.to_string( index = False)) #we let header=true to the first iteration to write the columns, and also write the data
else:
fout.write(df.to_string(header = False, index = False)) #we write the dataframe without the index and the columns names
fout.write('\n') #a newline to place correctly the next rows
示例
使用下面的一些虚拟文件 (test1.txt,test2.txt
),您可以看到带有两个选项的结果 (test3.txt
):
test1.txt
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
test2.txt
2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20
test3.txt(输出文件)选项 1
Year|Name|Stamina|Agility|Str|Res
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
test3.txt(输出文件)带选项 2
Year Name Stamina Agility Str Res
2020 Grum Grum Stamina: 20 Agility: 23 Strength: 20.5% Resistances: 20-21-30
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Grum Grum Stamina: 20 Agility: 23 Strength: 20.5% Resistances: 20-21-30
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
2020 Mondo Silo Stamina: 23 Agility: 13 Strength: 10.5% Resistances: 20-21-20
import pandas as pd
t = open('filepath', 'r').read()
data = [[a for a in x.split('\n') if a] for x in t.split('\n\n')]
datadf = pd.DataFrame(data)
print(datadf)
- 此选项会在将数据加载到数据帧之前修复数据格式。
- 这将以标准表格格式显示数据作为一个选项,因为已经有其他好的答案可以将数据转换为请求的格式。
- Headers 位于每列的顶部,数据位于 header. 下方的每行
- 从信息存储和检索的角度来看,这是呈现和存储数据的标准方式。
- 以标准方式存储数据可以更轻松地检索和使用其他工具可视化数据。
- 这将以标准表格格式显示数据作为一个选项,因为已经有其他好的答案可以将数据转换为请求的格式。
[0::6]
:列表切片,从 0 开始获取列表中的每第 6 个值
[1::6]
:列表切片从 1 开始获取列表中的每第 6 个值
- 使用
collections.defaultdict
获取列表元素并将它们转换成字典。 - 使用
sep=','
或sep='|'
将数据帧保存到 csv
- 用
df = pd.read_csv('characters.csv', sep='|')
读回文件
import pandas as pd
from collections import defaultdict as dd
# read the file
with open('test.txt', 'r') as f:
# read the text in; results in a list of strings
text_list = [r.strip() for r in f.readlines() if r.strip()] # remove all new lines and empty rows
# add Year: in front of each year number
years = text_list[0::6] # create a list of each year
text_list[0::6] = [f'Year: {f}' for f in years]
# add Name: in front of each name
names = text_list[1::6] # create a list of each name
text_list[1::6] = [f'Name: {f}' for f in names]
# split each string at ': '
text_list = [x.split(': ') for x in text_list]
# create a dict for each value
data = dd(list)
for text in text_list:
data[text[0]].append(text[1])
# load data into a dataframe
df = pd.DataFrame(data)
# display df
Year Name Stamina Agility Strength Resistances
0 2020 Grum Grum 20 23 20.5% 20-21-30
1 2020 Mondo Silo 23 13 10.5% 20-21-20
# save
df.to_csv('characters.csv', sep='|', index=False)
# file output
year|name|Stamina|Agility|Strength|Resistances
2020|Grum Grum|20|23|20.5%|20-21-30
2020|Mondo Silo|23|13|10.5%|20-21-20
如果您将文本文件保持为相同格式并在各组之间换行,这应该适合您:
import xlsxwriter
items = []
# parse through .txt file
with open('file.txt', 'r') as r:
text = list(r.read().splitlines())
while text.count('') != 0:
text.remove('')
x = 0
while True:
items.append([])
for num in range(0, 6):
items[x].append(text[0])
text.remove(text[0])
x += 1
if len(text) == 0:
break
print(items)
# Starting worksheet
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
row = 0
# Writing column titles
titles = ['Year', 'Name', 'Stamina', 'Agility', 'Str', 'Res']
for i in range(0, 6):
worksheet.write(row, i, titles[i])
# fills in data from parsed .txt file
x, row = 0, 1
while True:
for i in range(0, 6):
cur = items[x][0]
worksheet.write(row, i, cur)
items[x].remove(cur)
print(items)
row += 1
x += 1
print('hi')
if len(items) == x:
break
# Closes workbook
workbook.close()