如何从数据文件导入 numpy 结构化数组

Question

我正在尝试创建一个数组，其中包含从数据文件导入的 5 列。其中4个是浮点数，最后一个是字符串。

数据文件如下所示：

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa

我尝试了这些：

data = np.genfromtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

data = np.loadtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

，但两个代码只导入第一列。

为什么？我该如何解决这个问题？

感谢您的宝贵时间！ :)

Answer 1

您必须正确指定 str 类型："U20" 例如最多 20 个字符：

data = np.loadtxt('data.txt', dtype = "float,"*4 + "U20", delimiter = ",")

似乎有效：

array([( 5.1,  3.5,  1.4,  0.2, 'Iris-setosa'),
       ( 4.9,  3. ,  1.4,  0.2, 'Iris-setosa'),
       ( 4.7,  3.2,  1.3,  0.2, 'Iris-setosa'),
       ( 4.6,  3.1,  1.5,  0.2, 'Iris-setosa'),
       ( 5. ,  3.6,  1.4,  0.2, 'Iris-setosa'),
       ( 5.4,  3.9,  1.7,  0.4, 'Iris-setosa'),
       ( 4.6,  3.4,  1.4,  0.3, 'Iris-setosa'),
       ( 5. ,  3.4,  1.5,  0.2, 'Iris-setosa')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<U20')])

另一种方法使用 pandas 给你一个对象数组，但这会减慢进一步的计算：

In [336]: pd.read_csv('data.txt',header=None).values
Out[336]: 
array([[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
       [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
       [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
       [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
       [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'],
       [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'],
       [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'],
       [5.0, 3.4, 1.5, 0.2, 'Iris-setosa']], dtype=object)

如何从数据文件导入 numpy 结构化数组

How to import from a data file a numpy structured array

python

arrays

numpy

structured-array