如何处理大文本文件以更有效地获取该文件的一部分？

Question

我有一个这样的文本文件：

  A B C D E F ... X Y Z
a 1 0 1 2 1 0 ... 1 0 2
b 1 2 0 1 1 2 ... 1 0 0
c . . . . . . ..... . .
d . . . . . . ..... . .
e . . . . . . ..... . .
f . . . . . . ..... . .
. . . . . . . ..... . .
. . . . . . . ..... . .
. . . . . . . ..... . .
x 1 0 1 2 1 0 ... 1 0 2
y 0 0 1 0 1 1 ... 1 0 2
z 1 2 0 1 1 2 ... 1 0 0

我需要做的是：加载此文件并将 1000 行 E&F 行添加到新文本文件我曾使用 itertools 加载这个大文件，但无法使 E&F 行有效

#!/usr/bin/env python
# -*- coding:UTF-8 -*-
from itertools import islice
fout = open('a.txt','w')
with open('b.txt','r') as fin:
    n = 50
    while n > 0:
    next_n_lines = list(islice(fin,0,20))
    if not next_n_lines:
        break
    fout.write(''.join(next_n_lines))
    n = n - 1
fin.close()
fout.close()

Answer 1

您可以使用此代码

with open('a.txt','w') as fout:
    with open('b.txt','r') as fin:
        lines_done=0
        fin.readline() # skip the first line ("  A B C D E F ... X Y Z" in your example)
        # fout.write("E F\n") # uncomment this line if you want the column headings in fout
        for line in fin:  
            if lines_done>=1000: # you said 100 lines only
                break     
            ef=line[10:13] # solution A
            # ef=" ".join(line.split()[5:7]) # solution B
            fout.write(ef+"\n")
            lines_done+=1

请确定您需要哪种解决方案

A 适用于您的示例数据（E 为 10，F 为 12）并且速度更快
B 适用于更通用的空格分隔行（E 行中的第 5 个条目，F 行中的第 6 个条目）并且有点慢

Answer 2

由于列由“ ”分隔，您可以使用 str.split。

def extract(nin,nout,cidx,rows):
    with open(nout,'w') as fout:        
        with open(nin,'r') as fin:
            offest = 0
            for line in fin:
                 cols = line.strip().split(' ')
                 for cc in cidx:
                     fout.write(cols[cc+offest] )
                     fout.write(' ')
                 fout.write('\n')
                 offest = 1
                 rows -= 1
                 if rows == 0:
                     break                     
extract('b.txt','a.txt',[4,5],1000)

如何处理大文本文件以更有效地获取该文件的一部分？

How to process a large text file to get part of this file more efficiently?

python

dataset