ValueError: Bad Input Shape while fitting Logistic Regression Model
ValueError: Bad Input Shape while fitting Logistic Regression Model
我目前正在从头开始学习逻辑回归。我正在根据我制作的 Iris 数据集的更改形式创建逻辑回归模型。这是我的代码的摘录:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_excel('Iris_Dataset.xlsx')
df = df.dropna()
colour = pd.get_dummies(df.colour, drop_first = True)
species = pd.get_dummies(df.species, drop_first = True)
df = pd.concat([df, colour, species], axis = 1)
df = df.drop(['colour', 'species'], axis = 1)
x = df.drop(["versicolor", "virginica"], axis = 1)
y = pd.concat([df.versicolor, df.virginica], axis = 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LogisticRegression()
model.fit(x_train, y_train)
出于某种原因,在最后一条语句 (model.fit(x_train, y_train)),我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-0c77380c6ea5> in <module>()
18 model = LogisticRegression()
19
---> 20 model.fit(x_train, y_train)
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
795 return np.ravel(y)
796
--> 797 raise ValueError("bad input shape {0}".format(shape))
798
799
ValueError: bad input shape (86, 2)
无论我尝试什么,我都无法理解错误的含义,也无法理解为什么会出现此错误。请帮我解决这个问题。
顺便说一句,这是数据集。它采用 google sheet 的形式,但我在 Microsoft Excel 中复制粘贴了与 Iris_Dataset.xlsx 相同的数据集(我不知道如何共享excel 文件直接):
https://docs.google.com/spreadsheets/d/18zkZvPQ5q_ExaWu4dywsHtpO9D6ECdmXzvZahPryiqU/edit?usp=sharing
提前致谢。
编辑: 所以我尝试了其他方法,这次我只将颜色列转换为虚拟值并保持物种完好无损。这是代码:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_excel('Iris_Dataset.xlsx')
df = df.dropna()
colour = pd.get_dummies(df.colour, drop_first = True)
df = pd.concat([df, colour], axis = 1)
df = df.drop(['colour'], axis = 1)
x = df.drop(["species"], axis = 1)
y = df.species
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LogisticRegression()
model.fit(x_train, y_train)
我这样做只是为了颜色的唯一原因是,当我在没有虚拟值的情况下尝试相同的操作时(我试图只传递 'species' 列作为一个整体,而不是我之前所做的),是我得到了错误:could not convert string to float: 'violet'
所以在执行上面的代码后,我得到了输出,但是拟合过程在输出的同时给出了警告信息:
/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None, penalty='l2',
random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False)
接下来的其余代码虽然有些流畅,但我想知道消息的含义,或者有什么办法可以避免它?
您正在输入一个包含两列 y
的数据框作为 LogisticRegression.fit()
中的目标向量,这是不可能的。目标向量的形状必须是 (n_samples),在数据帧中是单列。
这是文档中的示例:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2, :])
clf.predict_proba(X[:2, :])
clf.score(X, y)
打印y
,你会看到它是一个一维数组。
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
我目前正在从头开始学习逻辑回归。我正在根据我制作的 Iris 数据集的更改形式创建逻辑回归模型。这是我的代码的摘录:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_excel('Iris_Dataset.xlsx')
df = df.dropna()
colour = pd.get_dummies(df.colour, drop_first = True)
species = pd.get_dummies(df.species, drop_first = True)
df = pd.concat([df, colour, species], axis = 1)
df = df.drop(['colour', 'species'], axis = 1)
x = df.drop(["versicolor", "virginica"], axis = 1)
y = pd.concat([df.versicolor, df.virginica], axis = 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LogisticRegression()
model.fit(x_train, y_train)
出于某种原因,在最后一条语句 (model.fit(x_train, y_train)),我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-0c77380c6ea5> in <module>()
18 model = LogisticRegression()
19
---> 20 model.fit(x_train, y_train)
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
795 return np.ravel(y)
796
--> 797 raise ValueError("bad input shape {0}".format(shape))
798
799
ValueError: bad input shape (86, 2)
无论我尝试什么,我都无法理解错误的含义,也无法理解为什么会出现此错误。请帮我解决这个问题。
顺便说一句,这是数据集。它采用 google sheet 的形式,但我在 Microsoft Excel 中复制粘贴了与 Iris_Dataset.xlsx 相同的数据集(我不知道如何共享excel 文件直接): https://docs.google.com/spreadsheets/d/18zkZvPQ5q_ExaWu4dywsHtpO9D6ECdmXzvZahPryiqU/edit?usp=sharing
提前致谢。
编辑: 所以我尝试了其他方法,这次我只将颜色列转换为虚拟值并保持物种完好无损。这是代码:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_excel('Iris_Dataset.xlsx')
df = df.dropna()
colour = pd.get_dummies(df.colour, drop_first = True)
df = pd.concat([df, colour], axis = 1)
df = df.drop(['colour'], axis = 1)
x = df.drop(["species"], axis = 1)
y = df.species
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LogisticRegression()
model.fit(x_train, y_train)
我这样做只是为了颜色的唯一原因是,当我在没有虚拟值的情况下尝试相同的操作时(我试图只传递 'species' 列作为一个整体,而不是我之前所做的),是我得到了错误:could not convert string to float: 'violet'
所以在执行上面的代码后,我得到了输出,但是拟合过程在输出的同时给出了警告信息:
/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None, penalty='l2',
random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False)
接下来的其余代码虽然有些流畅,但我想知道消息的含义,或者有什么办法可以避免它?
您正在输入一个包含两列 y
的数据框作为 LogisticRegression.fit()
中的目标向量,这是不可能的。目标向量的形状必须是 (n_samples),在数据帧中是单列。
这是文档中的示例:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2, :])
clf.predict_proba(X[:2, :])
clf.score(X, y)
打印y
,你会看到它是一个一维数组。
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html