'NoneType' 对象不可订阅 -- 使用 `np.fromregex`

Question

这个问题有很多答案（参见Python Math - TypeError: 'NoneType' object is not subscriptable）。我的问题是不同的，因为我正确地期望 np.genfromtxt(...) 到 return 一个数组（即 np.genfromtxt(...) 不是一个就地函数）。

我正在尝试解析以下内容并将其存储到一维数组中：

http://pastie.org/10860707#2-3

为此，我尝试了：

pattern = re.compile(b'[\s,]')
theta = np.fromregex("RegLogTheta", regexp = pattern, dtype = float)

这是回溯（应该如何格式化？）：

Traceback (most recent call last):
File "/Users/ahanagrawal/Documents/Java/MachL/Chap3/ExamScoreVisual2.py", line    36, in <module>
theta = np.fromregex("RegLogTheta", regexp = pattern, dtype = float)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1240, in fromregex
newdtype = np.dtype(dtype[dtype.names[0]])
TypeError: 'NoneType' object is not subscriptable

如果您想运行这个，请从以下位置下载文本文件：http://pastie.org/10860707#2-3 和运行上面的代码。

Answer 1

文件有多行，以逗号分隔，每行3个数字，除了最后只有2个

In [182]: fname='../Downloads/pastie-10860707.txt'

In [183]: np.fromregex(fname,regexp=pattern,dtype=float)
... 
np.fromregex(fname,regexp=pattern,dtype=float)

/usr/lib/python3/dist-packages/numpy/lib/npyio.py in fromregex(file, regexp, dtype)
   1240             # Create the new array as a single data-type and then
   1241             #   re-interpret as a single-field structured array.
-> 1242             newdtype = np.dtype(dtype[dtype.names[0]])
   1243             output = np.array(seq, dtype=newdtype)
   1244             output.dtype = dtype

TypeError: 'NoneType' object is not subscriptable

通过简单的 'br' 读取加载，文件如下所示：

In [184]: txt
Out[184]: b'2.75386225e+00,1.80508078e+00,2.95729122e+00,\n-4.21413726e+00,  -3.38139076e+00,  -4.22751379e+00,\n ...      4.23010784e-01,  -1.14839331e+00,  -9.56098910e-01,\n        -1.15019836e+00,   1.13845303e-06'

最后一行缺失的数字会导致 genfromtxt 问题。

你选择的模式是错误的。它看起来像一个分隔符模式。但是 fromregex 文档中的模式会产生组：

regexp = r"(\d+)\s+(...)"

fromregex 确实

seq = regexp.findall(file.read())  # read whole file and group it
output = np.array(seq, dtype=dtype)  # make array from seq

如果您想使用 fromregex，您需要想出一个模式来生成可以直接转换为数组的元组列表。

================

虽然再次查看错误消息，但我发现直接问题出在 dtype 上。 dtype=float 不是此函数的有效数据类型规范。它需要一个复合数据类型（结构化）。

此操作产生错误，其中 float 是您的 dtype 参数：

In [189]: np.dtype(float).names[0]
 ...
TypeError: 'NoneType' object is not subscriptable

但它正在尝试这样做，因为模式已经产生

In [194]: pattern.findall(txt)
Out[194]: 
[b',',
 b',',
 b',',
 b'\n',
 b',',
 b' ',
 b' ',
 ....]

不是它期望的元组列表。

==================

我可以用

加载文件

In [213]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0,1])
Out[213]: 
array([[  2.75386225e+00,   1.80508078e+00],
       [ -4.21413726e+00,  -3.38139076e+00],
       [  7.46991792e-01,  -1.08010066e+00],
        ...
       [  4.23010784e-01,  -1.14839331e+00],
       [ -1.15019836e+00,   1.13845303e-06]])

我正在使用 usecols 暂时解决最后一行只有 2 个数字的问题。

如果我删除 \n 并将其拆分为逗号，我可以直接使用 np.array.

解析生成的文本字段

In [231]: txt1=txt.replace(b'\n',b'').split(b',')

In [232]: np.array(txt1,float)
Out[232]: 
array([  2.75386225e+00,   1.80508078e+00,   2.95729122e+00,
        -4.21413726e+00,  -3.38139076e+00,  -4.22751379e+00,
          ...
         4.23010784e-01,  -1.14839331e+00,  -9.56098910e-01,
        -1.15019836e+00,   1.13845303e-06])

此格式包括小数点和科学计数法：

In [266]: pattern=re.compile(br"(\d+\.\d+e[\+\-]\d+)")

In [267]: np.fromregex(fname,regexp=pattern,dtype=np.dtype([('f0',float)]))['f0']
Out[267]: 
array([  2.75386225e+00,   1.80508078e+00,   2.95729122e+00,
         4.21413726e+00,   3.38139076e+00,   4.22751379e+00,
      ...
         4.23010784e-01,   1.14839331e+00,   9.56098910e-01,
         1.15019836e+00,   1.13845303e-06])

现在我正在创建一个结构化数组并提取该字段。可能有办法解决这个问题。但是 fromregex 似乎更喜欢使用结构化数据类型。

'NoneType' 对象不可订阅 -- 使用 `np.fromregex`

'NoneType' object is not subscriptable -- using `np.fromregex`

python

numpy

text-parsing