具有浮点数和整数的数组的随机森林评估 - numpy

Question

我有一个数组，其中包含作为浮点数的特征值，我有一个标签数组，它们是整数 - 1 和 0。

示例：特征值：

[[  17.99    10.38   122.8   ...,    0.147    0.242    0.079]
 [  20.57    17.77   132.9   ...,    0.07     0.181    0.057]]

当我将标签附加到特征值数组时，标签变成浮点数。示例 - feature_values 附加 0:

[[  17.99    10.38   122.8   ...,    0.242    0.079    0.   ]]

当我运行以下代码时：

training_set = data_features[:,0:9] 
test_set = data_features[:,9] 
seed = 7
num_trees = 100
max_features = 3
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, training_set, test_set, cv=kfold)
print(results.mean())

我收到一个错误：

raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

根据我的阅读，我发现这是因为标签是浮动的。

如果我将特征值的 dtype 更改为 "int"，代码确实有效，但我需要保留浮点数。

是否有任何方法可以将标签作为整数并将特征值作为浮点数，以便代码工作？

Answer 1

您需要将 y_labels 转换为整数，以便 RandomForestClassifier 可以对其进行训练。

test_set = data_features[:,9].astype(int)

具有浮点数和整数的数组的随机森林评估 - numpy

Random Forest evaluation for array with floats and integers - numpy

python

arrays

numpy

random-forest