StringIO 示例不起作用

StringIO example does not work

我试图了解 numpy.getfromtxt 方法和 io.StringIO 的工作原理。 在官方网站(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt)我找到了一些例子。这是其中之一:

s = StringIO("1,1.3,abcde")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),('mystring','S5')], delimiter=",")

但是当我在计算机上 运行 这段代码时,我得到:TypeError: must be str or None, not bytes

请告诉我如何解决它?

考虑升级 numpy,因为对于 numpy 的当前版本,您的代码可以正常工作。有关 np.genfromtxt 中的相关更改,请参阅 the mention in 1.14.0 release note highlights and the section Encoding argument for text IO functions

对于较旧的 numpy,您使用字符串对象作为输入,但您链接的文档说:

Note that generators must return byte strings in Python 3k. 

所以按照文档所说的做,并给它一个字节字符串:

import io
s = io.BytesIO(b"1,1.3,abcde")
In [200]: np.__version__
Out[200]: '1.14.0'

这个例子对我有用:

In [201]: s = io.StringIO("1,1.3,abcde")
In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[202]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

它也适用于字节串:

In [204]: s = io.BytesIO(b"1,1.3,abcde")
In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[205]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

genfromtxt 适用于任何为其提供信息的行,因此我通常直接使用字节串列表(在测试问题时):

In [206]: s = [b"1,1.3,abcde"]
In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[207]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

或多行

In [208]: s = b"""1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [209]: s
Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[210]: 
array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

以前是 dtype=Nonegenfromtxt 创建了 S 个字符串。

NumPy dtype issues in genfromtxt(), reads string in as bytestring

在1.14中,我们可以控制默认的字符串dtype:

In [219]: s = io.StringIO("1,1.3,abcde")
In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
  #!/usr/bin/python3
Out[220]: 
array((1, 1.3, b'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
In [221]: s = io.StringIO("1,1.3,abcde")
In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[222]: 
array((1, 1.3, 'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])

https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions

现在我可以使用 Py3 字符串生成示例,而不会产生所有那些丑陋的 b'string' 结果(但要记住并不是每个人都升级到 1.14):

In [223]: s = """1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[224]: 
array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])