如何 delete/remove 一组与巨大 txt 文件中的文本匹配的特定行

Question

我有一个巨大的 .txt 文件，看起来像 this

每 100 行之后重复以下行块：

ITEM: TIMESTEP 

1000100

ITEM: NUMBER OF ATOMS

100

ITEM: BOX BOUNDS pp pp pp

-5.63124 5.63124

-5.63124 5.63124

-5.63124 5.63124

ITEM: ATOMS id mol type xu yu zu vx vy vz

并且上面的文本块出现了大约 10000 次。我该如何具体摆脱这些行？

Answer 1

您可以检查起始单词 ITEM: TIMESTEP\n 然后跳过 8 行。

with open('samp.txt') as f:
    line = f.readline()
    while(line != ''):
        if line != '''ITEM: TIMESTEP\n''':
            print(line.strip())
        else:
            #skip 8 lines
            for i in range(8): f.readline()
        
        line = f.readline()

输出

17 1 1 -2.20243 -5.29512 -4.4049 -1.7509 -0.678094 -2.92041
21 1 1 -0.574106 -4.73233 -5.02726 0.630247 -1.43315 0.144725
50 1 1 -6.78421 -4.33292 -5.62459 2.38831 0.400303 -2.2132
27 1 1 -2.43637 -3.6223 -5.19709 1.75747 0.293975 0.56135
26 1 1 -2.28676 -3.00667 -4.51059 1.85878 -2.28114 2.43501
...

如何 delete/remove 一组与巨大 txt 文件中的文本匹配的特定行

How to delete/remove a certain set of lines which matches the text from a huge txt file

python

large-files