在 python 中构建决策树分类器时出现目标变量错误?
Target variable error when building a decision tree classifier in python?
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
当我尝试 运行 上述代码来拟合数据并训练模型时,出现以下错误。我正在为 python 使用 google colab
谁能帮我解决这个问题?
ValueError Traceback (most recent call last)
<ipython-input-33-3523056235b2> in <module>()
1 clf_entropy= DecisionTreeClassifier()
----> 2 clf_entropy.fit(X_train, y_train)
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
DecisionTreeClassifier
将检查您拥有的目标变量的类型,因此如果每个条目都是元组或列表,它将发出该警告,例如,它应该是这样的:
balance_data = pd.concat([
pd.DataFrame(np.random.choice(['A','B'],100)),
pd.DataFrame(np.random.uniform(0,1,(100,5)))
],axis=1)
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
现在,如果您的目标变量是列表或列表:
balance_data.iloc[:,0] = [[np.random.choice(['A','B','C'],1)] for i in range(100)]
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
Y[0]
Out[36]: [array(['C'], dtype='<U1')]
然后它会像你看到的那样抛出相同的警告:
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
ValueError: Unknown label type: 'unknown'
您应该做的是检查 balance_data.values[:,0] 中的内容,并确保没有嵌入列表或元组。
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
当我尝试 运行 上述代码来拟合数据并训练模型时,出现以下错误。我正在为 python 使用 google colab 谁能帮我解决这个问题?
ValueError Traceback (most recent call last)
<ipython-input-33-3523056235b2> in <module>()
1 clf_entropy= DecisionTreeClassifier()
----> 2 clf_entropy.fit(X_train, y_train)
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
DecisionTreeClassifier
将检查您拥有的目标变量的类型,因此如果每个条目都是元组或列表,它将发出该警告,例如,它应该是这样的:
balance_data = pd.concat([
pd.DataFrame(np.random.choice(['A','B'],100)),
pd.DataFrame(np.random.uniform(0,1,(100,5)))
],axis=1)
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
现在,如果您的目标变量是列表或列表:
balance_data.iloc[:,0] = [[np.random.choice(['A','B','C'],1)] for i in range(100)]
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
Y[0]
Out[36]: [array(['C'], dtype='<U1')]
然后它会像你看到的那样抛出相同的警告:
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
ValueError: Unknown label type: 'unknown'
您应该做的是检查 balance_data.values[:,0] 中的内容,并确保没有嵌入列表或元组。