从 Python 中的 .txt 文件中删除页码

Removing page numbers from a .txt file in Python

我正在尝试加载电子书的 .txt 文件并删除包含页码的行。这本书看起来像:

2
Words
More words.

More words.

3
More words.

这是我目前的情况:

x = 1

with open("first.txt","r") as input:
    with open("last.txt","wb") as output: 
        for line in input:
            if line != str(x) + "\n":
                output.write(line + "\n")
                x + x + 1

我的输出文件中删除了所有白色 space(新行)(我不想要),它甚至没有删除数字。有人有什么想法吗?谢谢!

1) 您不必为二进制 open("last.txt","wb") -> open("last.txt","w") 打开文件 2) x + x + 1 -> x += 1

但是,你可以做得更简单

with open("first.txt","r") as input:
    with open("last.txt","w") as output: 
        for line in input:
            line = line.strip() # clear white space
            try: 
                int(line) #is this a number ?
            except ValueError:
                output.write(line + "\n")

检查是否可以将该行转换为整数,如果成功则跳过该行。不是最快的解决方案,但应该可行。

try:
   int(line)
   # skip storing that line
   continue
except ValueError:
   # save the line to output

使用正则表达式忽略仅包含数字的行。

import sys
import re

pattern = re.compile("""^\d+$""")

for line in sys.stdin:
    if not pattern.match(line):
        sys.stdout.write(line)

改进的解决方案 - 减少一个缩进级别,避免不必要的 strip 和字符串求和,捕获显式异常。

with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
    for line in input_file:
        try: 
            int(line)
        except ValueError:
            output_file.write(line)