将 3D numpy 图像数组重塑为 2D numpy 数组以用于 XGBoost DMatrix 输入

Question

我有一组 3D 数组中的图像（维度索引 * 高度 * 宽度）。

x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.2, random_state=42, stratify=y)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

dtrain = xgb.DMatrix(data=x_train, label=y_train)
dtest = xgb.DMatrix(data=x_test)

我从 XGBoost DMatrix 输入中得到一个错误：

ValueError: ('Expecting 2 dimensional numpy.ndarray, got: ', (2164, 120, 431))

上面打印的阵列形状：

(2164, 120, 431) (542, 120, 431) (2164, 3) (542, 3)

我对如何重塑数据感到困惑。是否需要2164行*1列？

Answer 1

只需重塑您的 x numpy 数组。

x_train = x_train.reshape(x_train.shape[0], -1)

x_test = x_test.reshape(x_test.shape[0], -1)

Answer 2

阅读 documentation 似乎 X 需要是二维的，Y 需要是一维的。因此 X 需要具有形状（index_of_sample，特征），因此宽度和高度需要展平为单个向量（这对图像来说不是一个好主意，因为你失去了结构价值，但是这又必须发生，因为型号是 xgb)

因此，您需要将 X 重塑为

x_train = x_train.reshape(x_train.shape[0], -1)
x_test = x_test.reshape(x_test.shape[0], -1)

此外，文档指出 Y 需要是一维的。因此，您需要以某种方式将 Y 更改为分类值，而不是当前（我假设）的单热编码。

将 3D numpy 图像数组重塑为 2D numpy 数组以用于 XGBoost DMatrix 输入

Reshape 3D numpy array of images to 2D numpy array for XGBoost DMatrix input

python

numpy

xgboost

numpy-ndarray