从 Python 中的 .txt 文件中删除页码
Removing page numbers from a .txt file in Python
我正在尝试加载电子书的 .txt 文件并删除包含页码的行。这本书看起来像:
2
Words
More words.
More words.
3
More words.
这是我目前的情况:
x = 1
with open("first.txt","r") as input:
with open("last.txt","wb") as output:
for line in input:
if line != str(x) + "\n":
output.write(line + "\n")
x + x + 1
我的输出文件中删除了所有白色 space(新行)(我不想要),它甚至没有删除数字。有人有什么想法吗?谢谢!
1) 您不必为二进制 open("last.txt","wb")
-> open("last.txt","w")
打开文件
2) x + x + 1
-> x += 1
但是,你可以做得更简单
with open("first.txt","r") as input:
with open("last.txt","w") as output:
for line in input:
line = line.strip() # clear white space
try:
int(line) #is this a number ?
except ValueError:
output.write(line + "\n")
检查是否可以将该行转换为整数,如果成功则跳过该行。不是最快的解决方案,但应该可行。
try:
int(line)
# skip storing that line
continue
except ValueError:
# save the line to output
使用正则表达式忽略仅包含数字的行。
import sys
import re
pattern = re.compile("""^\d+$""")
for line in sys.stdin:
if not pattern.match(line):
sys.stdout.write(line)
改进的解决方案 - 减少一个缩进级别,避免不必要的 strip
和字符串求和,捕获显式异常。
with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
for line in input_file:
try:
int(line)
except ValueError:
output_file.write(line)
我正在尝试加载电子书的 .txt 文件并删除包含页码的行。这本书看起来像:
2
Words
More words.
More words.
3
More words.
这是我目前的情况:
x = 1
with open("first.txt","r") as input:
with open("last.txt","wb") as output:
for line in input:
if line != str(x) + "\n":
output.write(line + "\n")
x + x + 1
我的输出文件中删除了所有白色 space(新行)(我不想要),它甚至没有删除数字。有人有什么想法吗?谢谢!
1) 您不必为二进制 open("last.txt","wb")
-> open("last.txt","w")
打开文件
2) x + x + 1
-> x += 1
但是,你可以做得更简单
with open("first.txt","r") as input:
with open("last.txt","w") as output:
for line in input:
line = line.strip() # clear white space
try:
int(line) #is this a number ?
except ValueError:
output.write(line + "\n")
检查是否可以将该行转换为整数,如果成功则跳过该行。不是最快的解决方案,但应该可行。
try:
int(line)
# skip storing that line
continue
except ValueError:
# save the line to output
使用正则表达式忽略仅包含数字的行。
import sys
import re
pattern = re.compile("""^\d+$""")
for line in sys.stdin:
if not pattern.match(line):
sys.stdout.write(line)
改进的解决方案 - 减少一个缩进级别,避免不必要的 strip
和字符串求和,捕获显式异常。
with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
for line in input_file:
try:
int(line)
except ValueError:
output_file.write(line)