在 python 中构建决策树分类器时出现目标变量错误?

Target variable error when building a decision tree classifier in python?

X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
 max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)

当我尝试 运行 上述代码来拟合数据并训练模型时,出现以下错误。我正在为 python 使用 google colab 谁能帮我解决这个问题?

ValueError                                Traceback (most recent call last)
<ipython-input-33-3523056235b2> in <module>()
      1 clf_entropy= DecisionTreeClassifier()
----> 2 clf_entropy.fit(X_train, y_train)

2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    167     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    168                       'multilabel-indicator', 'multilabel-sequences']:
--> 169         raise ValueError("Unknown label type: %r" % y_type)

DecisionTreeClassifier 将检查您拥有的目标变量的类型,因此如果每个条目都是元组或列表,它将发出该警告,例如,它应该是这样的:

balance_data = pd.concat([
pd.DataFrame(np.random.choice(['A','B'],100)),
pd.DataFrame(np.random.uniform(0,1,(100,5)))
],axis=1)

X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
 max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)

现在,如果您的目标变量是列表或列表:

balance_data.iloc[:,0] = [[np.random.choice(['A','B','C'],1)] for i in range(100)]
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]

Y[0]
Out[36]: [array(['C'], dtype='<U1')]

然后它会像你看到的那样抛出相同的警告:

X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
 max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)

ValueError: Unknown label type: 'unknown'

您应该做的是检查 balance_data.values[:,0] 中的内容,并确保没有嵌入列表或元组。