使用 numpy genfromtxt 使用文本 headers 将单列中的数据读入多列

Question

我正在尝试使用 genfromtxt 从文件中导入一组 pre-defined x,y 点的一些数据（压力、压力）。其中数据仅输出为由 header 名称分割的长列，例如：

time
1.0022181

PORE_PRE
-18438721.41
-18438721.41
........

STRS_11
-28438721.41
-28438721.41
........

时间数据只有一个点，而PORE_PRE和STRS_11等变量包含很多但数量相等的数据点。我使用以下代码：

import numpy as np
import matplotlib.pyplot as plt


file1=open('Z:/EFNHigh_Res/data_tstep1.out','r')
time=np.genfromtxt(file1,names=None,dtype=None,autostrip=True)

通过这段代码，我得到了一个结构化数组，所有数据都在一列中。我设法删除了时间，删除了前两行。

我最初的想法是使用与我之前找到的数据点数量和列中数据点总数相关的信息来重塑数组。例如：

xx=np.reshape(time3,307,4)
print xx

但是我得到了下面的错误，并且似乎无法找到重塑它的方法，我猜由于数组的一维类型性质，出于某种原因这是不可能的。

 File "Z:\EFNHigh_Res\plotting.py", line 39, in <module>
    xx=np.reshape(time3,307,4)
  File "C:\Python27\ArcGIS10.2\lib\site-packages\numpy\core\fromnumeric.py",line 171, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

我对输出格式没有太多选择（除了更复杂的排列）。看起来应该是一个简单的操作，但是我想不通，但是我对python很陌生。我还尝试使用以下代码仅查看浮点数据，但出现如下错误，或者出现大量数据点，大于数组中包含的数据点。

xx=time3.view(dtype=np.float)
ValueError: new type not compatible with array

任何人都可以建议我如何处理读入的文件吗？

Answer 1

您需要分块读取文件。 genfromtxt 接受来自任何可迭代对象、字符串列表、生成器、打开的文件等的输入。因此，您需要一个脚本来打开文件、读取块的行，并调用 genfromtxt那些，将结果保存在列表中。最后，您可以将这些子数组收集到一个数组中。

有一个使用 readlines 的简单示例。从行列表开始工作是开发您的想法的最简单方法 - 找到块的边界等。如果您不希望内存中的完整文件，您可以稍后将其重新处理为生成器或过滤器结构。

对合并结构化数组进行了扩展讨论。

示例脚本：

import numpy as np

lines = open('stack35510689.txt').readlines()
print lines
time = float(lines[1].strip())
print time
arr1 = np.genfromtxt(lines[3:6], names=True)
print repr(arr1)
arr2 = np.genfromtxt(lines[7:10], names=True)
print repr(arr2)

import numpy.lib.recfunctions as rfn
print repr(rfn.merge_arrays([arr1,arr2]))

样本来源

time
1.0022181

PORE_PRE
-18438721.41
-18438721.41

STRS_11
-28438721.41
-28438721.41

示例输出

1009:~/mypy$ python stack35510689.py
['time\n', '1.0022181\n', '\n', 'PORE_PRE\n', '-18438721.41\n', '-18438721.41\n', '\n', 'STRS_11\n', '-28438721.41\n', '-28438721.41\n']
1.0022181
array([(-18438721.41,), (-18438721.41,)], 
      dtype=[('PORE_PRE', '<f8')])
array([(-28438721.41,), (-28438721.41,)], 
      dtype=[('STRS_11', '<f8')])
array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)], 
      dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])

用一个 genfromtxt 读取同一个文件会产生一维字符串数组

In [819]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=None,autostrip=True)
In [820]: data
Out[820]: 
array(['time', '1.0022181', 'PORE_PRE', '-18438721.41', '-18438721.41',
       'STRS_11', '-28438721.41', '-28438721.41'], 
      dtype='|S12')

如果我将 dtype 更改为 float，我会得到数字，nan 字符串所在的位置

In [821]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=float,autostrip=True)

In [822]: data
Out[822]: 
array([             nan,   1.00221810e+00,              nan,
        -1.84387214e+07,  -1.84387214e+07,              nan,
        -2.84387214e+07,  -2.84387214e+07])

我可以通过切片从中收集数字

In [826]: np.array([data[3:5],data[6:8]])
Out[826]: 
array([[-18438721.41, -18438721.41],
       [-28438721.41, -28438721.41]])

或者像以前一样制作结构化数组

In [827]: x=np.zeros((2,),dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])
In [828]: x['PORE_PRE']=data[3:5]
In [829]: x['STRS_11']=data[6:8]
In [830]: x
Out[830]: 
array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)], 
      dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])

使用 numpy genfromtxt 使用文本 headers 将单列中的数据读入多列

Using numpy genfromtxt to read in data in single column to multiple columns using text headers

python

arrays

numpy

genfromtxt