对 Sagemaker RecordIO 格式的标签使用 numpy.ndarray 类型(多标签)?
Using numpy.ndarray type (multilabel) for labels in Sagemaker RecordIO format?
我正在尝试编写 numpy.ndarray 作为 Amazon Sagemaker 转换工具的标签:write_numpy_to_dense_tensor()。它将特征和标签的 numpy 数组转换为 RecordIO,以便更好地使用 Sagemaker 算法。
但是,如果我尝试为标签传递多标签输出,我会收到一条错误消息,指出它只能是向量(即每个特征行的标量)。
有没有办法在标签中有多个值?这对于可以通过 XGBoost、随机森林、神经网络等实现的多维回归很有用
代码
import sagemaker.amazon.common as smac
print("Types: {}, {}".format(type(X_train), type(y_train)))
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
输出:
Types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
X_train shape: (9919, 2684)
y_train shape: (9919, 20)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-fc1033b7e309> in <module>()
3 print("y_train shape: {}".format(y_train.shape))
4 f = io.BytesIO()
----> 5 smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/amazon/common.py in write_numpy_to_dense_tensor(file, array, labels)
94 if labels is not None:
95 if not len(labels.shape) == 1:
---> 96 raise ValueError("Labels must be a Vector")
97 if labels.shape[0] not in array.shape:
98 raise ValueError("Label shape {} not compatible with array shape {}".format(
ValueError: Labels must be a Vector
Tom,XGBoost 不支持 RecordIO 格式。它只支持 csv 和 libsvm。此外,该算法本身并不支持多标签。但是有几种解决方法:Xg boost for multilabel classification?
Random Cut Forest 也不支持多标签。如果提供了多个标签,它只会选择第一个。
我正在尝试编写 numpy.ndarray 作为 Amazon Sagemaker 转换工具的标签:write_numpy_to_dense_tensor()。它将特征和标签的 numpy 数组转换为 RecordIO,以便更好地使用 Sagemaker 算法。
但是,如果我尝试为标签传递多标签输出,我会收到一条错误消息,指出它只能是向量(即每个特征行的标量)。
有没有办法在标签中有多个值?这对于可以通过 XGBoost、随机森林、神经网络等实现的多维回归很有用
代码
import sagemaker.amazon.common as smac
print("Types: {}, {}".format(type(X_train), type(y_train)))
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
输出:
Types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
X_train shape: (9919, 2684)
y_train shape: (9919, 20)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-fc1033b7e309> in <module>()
3 print("y_train shape: {}".format(y_train.shape))
4 f = io.BytesIO()
----> 5 smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/amazon/common.py in write_numpy_to_dense_tensor(file, array, labels)
94 if labels is not None:
95 if not len(labels.shape) == 1:
---> 96 raise ValueError("Labels must be a Vector")
97 if labels.shape[0] not in array.shape:
98 raise ValueError("Label shape {} not compatible with array shape {}".format(
ValueError: Labels must be a Vector
Tom,XGBoost 不支持 RecordIO 格式。它只支持 csv 和 libsvm。此外,该算法本身并不支持多标签。但是有几种解决方法:Xg boost for multilabel classification?
Random Cut Forest 也不支持多标签。如果提供了多个标签,它只会选择第一个。