Python 编码文本文件,打开它,替换多个部分并输出没有空行的文本格式为.csv 样式
Python encoding textfile, open it, replace multiple sections and output without empty lines as text formatted in .csv style
我有一个文件 "test.xls",它基本上是一个旧的 xls(xml 格式),在记事本中看起来像这样:
<table cellspacing="1" rules="all" border="1">
<tr>
<td>Row A</td><td>Row B</td><td>Row C</td>
</tr>
<tr>
<td>New York</td><td>23</td><td>warm</td>
</tr>
<tr>
<td>San Francisco</td><td>40</td><td>hot</td>
</tr>
</table>
现在我正在使用 Python 将其转换为 .txt(平面文件),稍后我可以将其导入我的 MSSQL 数据库。
我目前拥有的:
import codecs
import os
# read the file with a specific encoding
with codecs.open('test.xls', 'r', encoding = 'ansi') as file_in, codecs.open('test_out.txt', 'w') as file_out:
lines = file_in.read()
lines = lines.replace('<tr>', '')
# save the manipulated data into a new file with new encoding
file_out.write(lines)
这种方法生成的 .txt 如下所示:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
我试过多种方法去掉空行,最后一个是:
for lines in file_in:
if line != '\n':
file_out.write(lines)
但是文件要么看起来一样,要么完全是空的
去除空行:
list.txt:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
因此:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
print(line)
输出:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
编辑:
也许,从文件中读取然后覆盖它,使用存储结果的列表,稍后可以将结果写入文件。
logFile = "list.txt" # your file name
results = [] # an empty list to store the lines
with open(logFile) as f: # open the file
content = f.readlines() # read the lines
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()] # removing the empty lines
for line in content:
results.append(line) # appending each line to the list
print(results) # printing the list
with open(logFile, "w") as f: # open the file in write mode
for elem in results: # for each line stored in the results list
f.write(str(elem) + '\n') # write the line to the file
print("Thank you, your data was overwritten") # Tadaa-h!
我有一个文件 "test.xls",它基本上是一个旧的 xls(xml 格式),在记事本中看起来像这样:
<table cellspacing="1" rules="all" border="1">
<tr>
<td>Row A</td><td>Row B</td><td>Row C</td>
</tr>
<tr>
<td>New York</td><td>23</td><td>warm</td>
</tr>
<tr>
<td>San Francisco</td><td>40</td><td>hot</td>
</tr>
</table>
现在我正在使用 Python 将其转换为 .txt(平面文件),稍后我可以将其导入我的 MSSQL 数据库。
我目前拥有的:
import codecs
import os
# read the file with a specific encoding
with codecs.open('test.xls', 'r', encoding = 'ansi') as file_in, codecs.open('test_out.txt', 'w') as file_out:
lines = file_in.read()
lines = lines.replace('<tr>', '')
# save the manipulated data into a new file with new encoding
file_out.write(lines)
这种方法生成的 .txt 如下所示:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
我试过多种方法去掉空行,最后一个是:
for lines in file_in:
if line != '\n':
file_out.write(lines)
但是文件要么看起来一样,要么完全是空的
去除空行:
list.txt:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
因此:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
print(line)
输出:
Row A;Row B;Row C
New York;23;warm
San Francisco;40;hot
编辑:
也许,从文件中读取然后覆盖它,使用存储结果的列表,稍后可以将结果写入文件。
logFile = "list.txt" # your file name
results = [] # an empty list to store the lines
with open(logFile) as f: # open the file
content = f.readlines() # read the lines
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()] # removing the empty lines
for line in content:
results.append(line) # appending each line to the list
print(results) # printing the list
with open(logFile, "w") as f: # open the file in write mode
for elem in results: # for each line stored in the results list
f.write(str(elem) + '\n') # write the line to the file
print("Thank you, your data was overwritten") # Tadaa-h!