从 Python 中的 .txt 文件中删除页码

Question

我正在尝试加载电子书的 .txt 文件并删除包含页码的行。这本书看起来像：

2
Words
More words.

More words.

3
More words.

这是我目前的情况：

x = 1

with open("first.txt","r") as input:
    with open("last.txt","wb") as output: 
        for line in input:
            if line != str(x) + "\n":
                output.write(line + "\n")
                x + x + 1

我的输出文件中删除了所有白色 space（新行）（我不想要），它甚至没有删除数字。有人有什么想法吗？谢谢！

Answer 1

1) 您不必为二进制 open("last.txt","wb") -> open("last.txt","w") 打开文件 2) x + x + 1 -> x += 1

但是，你可以做得更简单

with open("first.txt","r") as input:
    with open("last.txt","w") as output: 
        for line in input:
            line = line.strip() # clear white space
            try: 
                int(line) #is this a number ?
            except ValueError:
                output.write(line + "\n")

Answer 2

检查是否可以将该行转换为整数，如果成功则跳过该行。不是最快的解决方案，但应该可行。

try:
   int(line)
   # skip storing that line
   continue
except ValueError:
   # save the line to output

Answer 3

使用正则表达式忽略仅包含数字的行。

import sys
import re

pattern = re.compile("""^\d+$""")

for line in sys.stdin:
    if not pattern.match(line):
        sys.stdout.write(line)

Answer 4

改进的解决方案 - 减少一个缩进级别，避免不必要的 strip 和字符串求和，捕获显式异常。

with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
    for line in input_file:
        try: 
            int(line)
        except ValueError:
            output_file.write(line)

从 Python 中的 .txt 文件中删除页码

Removing page numbers from a .txt file in Python

python

string

file-io

file

text-files