只读取某些行的文本文件以在 Python 中拆分文件

Question

我正在使用 np.loadtxt 加载一个文本文件，并希望 python 将它分成四个。通常我只是将每组数据复制粘贴到不同的文本文件中，然后为每个文本文件执行 np.loadtxt，但我将不得不这样做数百次，所以这太耗时了。

这是文本文件的简化版本。所以我想做的是让 python 读取第一个数字 (0.6999) 并丢弃它，然后读取接下来的 5 行值并为每一列分配变量名，然后是接下来的 5 行再次将变量添加到每一列，依此类推。

Is there any way I could tell python to maybe do np.loadtext only for row 1, then only for row 2 to 6, then 7 to 12 etc?

   0.699999988
   1    0.2000    0.0618
   2    0.2500    0.0417
   3    0.3000    0.0371
   4    0.3500    0.0390
   5    0.4500    0.0761
    670.0000  169.4000 6.708E-09
    635.0001  169.1806 1.584E-08
    612.9515  168.6255 2.724E-08
    591.2781  168.2719 4.647E-08
  670.00  0.0E+00  0.0E+00  0.0E+00  0.0E+00  0.0E+00  0.0E+00  0.0E+00
  635.00  9.8E-07  4.2E-07  2.1E-07  1.2E-07  4.4E-08  1.8E-08  1.4E-08
  612.95  6.0E-06  3.5E-06  2.1E-06  1.3E-06  4.7E-07  1.8E-07  1.4E-07
  591.28  2.2E-05  1.3E-05  7.7E-06  4.9E-06  1.8E-06  6.6E-07  5.0E-07
  569.98  8.3E-05  5.0E-05  2.8E-05  1.8E-05  6.4E-06  2.4E-06  1.8E-06
  549.06  3.0E-04  1.8E-04  1.0E-04  6.2E-05  2.3E-05  8.4E-06  6.4E-06
  528.51  7.8E-04  5.0E-04  2.8E-04  1.7E-04  6.2E-05  2.3E-05  1.8E-05
  508.34  1.6E-03  1.0E-03  5.8E-04  3.4E-04  1.3E-04  4.9E-05  3.7E-05

这是我用于我的三个不同文本文件的内容：

altvall,T,Pp= np.loadtxt('file1.txt',usecols = (0,1,2),unpack=True) # load text file

tau1,tau2,tau3,tau4,tau5,tau6,tau7 = np.loadtxt('file2.txt',usecols = (1,2,3,4,5,6,7),unpack=True) # load text file

wvln,alb = np.loadtxt('file3.txt',usecols = (1,2),unpack=True) # load text file

现在我只想要类似的东西，但又不想将我的文本文件分成不同的部分。

Answer 1

一种简单的方法是使用 itertools.izip_longest 将输入文件的行分成 5 组。关键是执行以下操作：

for rows in izip_longest(*[file_object]*N):
    # rows will be a tuple of N consecutive rows
    # do something with rows

完整示例：

import numpy as np
from itertools import izip_longest

data = []
with open(filehandle, 'r') as fin:
    fin.next() # skip first line
    for rows in izip_longest(*[fin]*5): # read fin 5 rows at a time
        rows = [map(float, r.strip().split()) for r in rows]
        data.append(np.array(rows))

这会产生一个 5xN 数组列表：

>>> print data
[array([[ 1.    ,  0.2   ,  0.0618],
       [ 2.    ,  0.25  ,  0.0417],
       [ 3.    ,  0.3   ,  0.0371],
       [ 4.    ,  0.35  ,  0.039 ],
       [ 5.    ,  0.45  ,  0.0761]]),
 array([[  6.70000000e+02,   1.69400000e+02,   6.70800000e-09],
       [  6.35000100e+02,   1.69180600e+02,   1.58400000e-08],
       [  6.12951500e+02,   1.68625500e+02,   2.72400000e-08],
       [  5.91278100e+02,   1.68271900e+02,   4.64700000e-08],
       [  5.69980100e+02,   1.68055300e+02,   7.85900000e-08]]),
 array([[  6.70000000e+02,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  6.35000000e+02,   9.80000000e-07,   4.20000000e-07,
          2.10000000e-07,   1.20000000e-07,   4.40000000e-08,
          1.80000000e-08,   1.40000000e-08],
       [  6.12950000e+02,   6.00000000e-06,   3.50000000e-06,
          2.10000000e-06,   1.30000000e-06,   4.70000000e-07,
          1.80000000e-07,   1.40000000e-07],
       [  5.91280000e+02,   2.20000000e-05,   1.30000000e-05,
          7.70000000e-06,   4.90000000e-06,   1.80000000e-06,
          6.60000000e-07,   5.00000000e-07],
       [  5.69980000e+02,   8.30000000e-05,   5.00000000e-05,
          2.80000000e-05,   1.80000000e-05,   6.40000000e-06,
          2.40000000e-06,   1.80000000e-06]])]

只读取某些行的文本文件以在 Python 中拆分文件

Read text file only for certain rows to split up file in Python

python

numpy

text-files