MXNET - 数据类型“<type 'numpy.ndarray'>”无效,应为 NDArray,numpy.ndarray,
MXNET - Invalid type '<type 'numpy.ndarray'>' for data, should be NDArray, numpy.ndarray,
我在使用 mxnet
时遇到基本 IO 问题。我正在尝试使用 mxnet.io.NDArrayIter
读取内存数据集以在 mxnet 中进行训练。我有以下代码(为简洁起见进行了压缩),它对代码进行预处理并尝试对其进行迭代(主要基于 tutorial):
import csv
import mxnet as mx
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
with open('data.csv', 'r') as data_file:
data = list(csv.reader(data_file))
labels = np.array(map(lambda x: x[1], data)) # one-hot encoded classes
data = map(lambda x: x[0], data) # raw text in need of pre-processing
transformer = Pipeline(steps=(('count_vectorizer', CountVectorizer()),
('tfidf_transformer', TfidfTransformer())))
preprocessed_data = np.array([np.array(row) for row in transformer.fit_transform(data)])
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=labels, batch_size=50)
for i, batch in enumerate(training_data):
print(batch)
执行这段代码时,我收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 510, in _init_data
data[k] = array(v)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/utils.py", line 146, in array
return _array(source_array, ctx=ctx, dtype=dtype)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 2245, in array
arr[:] = source_array
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 437, in __setitem__
self._set_nd_basic_indexing(key, value)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 698, in _set_nd_basic_indexing
self._sync_copyfrom(value)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 856, in _sync_copyfrom
source_array = np.ascontiguousarray(source_array, dtype=self.dtype)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py", line 581, in ascontiguousarray
return array(a, dtype, copy=False, order='C', ndmin=1)
TypeError: float() argument must be a string or a number, not 'csr_matrix'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mxnet_test.py", line 20, in <module>
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=labels, batch_size=50)
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 643, in __init__
self.data = _init_data(data, allow_empty=False, default_name=data_name)
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 513, in _init_data
"should be NDArray, numpy.ndarray or h5py.Dataset")
TypeError: Invalid type '<class 'numpy.ndarray'>' for data, should be NDArray, numpy.ndarray or h5py.Dataset
我不明白,因为我的数据在创建 NDArrayIter
实例之前被转换为 numpy.ndarray
。有人愿意就如何读取 mxnet
中的数据提供一些见解吗?
以上代码目前使用以下版本:
- mxnet-1.1.0
- numpy-1.14.2
在 user2357112
的帮助下,通过在 Python 3 中使用异常链接找到异常(已更新问题)解决了这个问题:
transformer
管道返回 numpy.array
个 scipy.sparse.csr_matrix
矩阵,而不是二维 numpy.array
。通过添加更改以下行以使用 toarray
方法进行转换,脚本将 运行.
preprocessed_data = np.array([row.toarray() for row in transformer.fit_transform(data)])
最佳解决方案:toarray
在 scipy.sparse.csr_matrix
上使用时在内存消耗方面效率低下。在 mxnet
的 1.10
版本中,可以使用 mxnet.nd.sparse.array
来更有效地存储数据:
...
preprocessed_data = mx.nd.sparse.array(transformer.fit_transform(data))
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=preprocessed_labels, batch_size=5, last_batch_handle='discard')
for i, batch in enumerate(training_data):
print(batch)
唯一需要注意的是,必须在 NDArrayIter
中使用 last_batch_handle='discard'
关键字参数(last_batch_handle
here 的功能)
我在使用 mxnet
时遇到基本 IO 问题。我正在尝试使用 mxnet.io.NDArrayIter
读取内存数据集以在 mxnet 中进行训练。我有以下代码(为简洁起见进行了压缩),它对代码进行预处理并尝试对其进行迭代(主要基于 tutorial):
import csv
import mxnet as mx
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
with open('data.csv', 'r') as data_file:
data = list(csv.reader(data_file))
labels = np.array(map(lambda x: x[1], data)) # one-hot encoded classes
data = map(lambda x: x[0], data) # raw text in need of pre-processing
transformer = Pipeline(steps=(('count_vectorizer', CountVectorizer()),
('tfidf_transformer', TfidfTransformer())))
preprocessed_data = np.array([np.array(row) for row in transformer.fit_transform(data)])
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=labels, batch_size=50)
for i, batch in enumerate(training_data):
print(batch)
执行这段代码时,我收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 510, in _init_data
data[k] = array(v)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/utils.py", line 146, in array
return _array(source_array, ctx=ctx, dtype=dtype)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 2245, in array
arr[:] = source_array
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 437, in __setitem__
self._set_nd_basic_indexing(key, value)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 698, in _set_nd_basic_indexing
self._sync_copyfrom(value)
File "/usr/local/lib/python3.5/dist-packages/mxnet/ndarray/ndarray.py", line 856, in _sync_copyfrom
source_array = np.ascontiguousarray(source_array, dtype=self.dtype)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py", line 581, in ascontiguousarray
return array(a, dtype, copy=False, order='C', ndmin=1)
TypeError: float() argument must be a string or a number, not 'csr_matrix'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mxnet_test.py", line 20, in <module>
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=labels, batch_size=50)
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 643, in __init__
self.data = _init_data(data, allow_empty=False, default_name=data_name)
File "/usr/local/lib/python3.5/dist-packages/mxnet/io.py", line 513, in _init_data
"should be NDArray, numpy.ndarray or h5py.Dataset")
TypeError: Invalid type '<class 'numpy.ndarray'>' for data, should be NDArray, numpy.ndarray or h5py.Dataset
我不明白,因为我的数据在创建 NDArrayIter
实例之前被转换为 numpy.ndarray
。有人愿意就如何读取 mxnet
中的数据提供一些见解吗?
以上代码目前使用以下版本:
- mxnet-1.1.0
- numpy-1.14.2
在 user2357112
的帮助下,通过在 Python 3 中使用异常链接找到异常(已更新问题)解决了这个问题:
transformer
管道返回 numpy.array
个 scipy.sparse.csr_matrix
矩阵,而不是二维 numpy.array
。通过添加更改以下行以使用 toarray
方法进行转换,脚本将 运行.
preprocessed_data = np.array([row.toarray() for row in transformer.fit_transform(data)])
最佳解决方案:toarray
在 scipy.sparse.csr_matrix
上使用时在内存消耗方面效率低下。在 mxnet
的 1.10
版本中,可以使用 mxnet.nd.sparse.array
来更有效地存储数据:
...
preprocessed_data = mx.nd.sparse.array(transformer.fit_transform(data))
training_data = mx.io.NDArrayIter(data=preprocessed_data, label=preprocessed_labels, batch_size=5, last_batch_handle='discard')
for i, batch in enumerate(training_data):
print(batch)
唯一需要注意的是,必须在 NDArrayIter
中使用 last_batch_handle='discard'
关键字参数(last_batch_handle
here 的功能)