"ValueError: Found input variables with inconsistent numbers of samples: [40, 10]" Problem with splitting the data
"ValueError: Found input variables with inconsistent numbers of samples: [40, 10]" Problem with splitting the data
我正在使用 Udemy 课程中的示例数据进行培训。数据中有 51 行,我正在尝试打印模型的分数。
我得到的错误是:
ValueError: Found input variables with inconsistent numbers of samples: [40, 10]
我理解 [40,10] 是指训练和测试,因为我将 test_size 设置为“0.2”。
代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts
data = pd.read_csv("50_Startups.csv")
X = data.drop("Profit",axis = 1)
y = data[["Profit"]]
from sklearn.preprocessing import OneHotEncoder
cat = ["State"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)
print(transformed_X)
from sklearn.ensemble import RandomForestRegressor as RFR
model = RFR()
X_train , y_train, X_test , y_test = tts(transformed_X,y,test_size=0.2)
model.fit(X_train,y_train)
print(model.score(X_test,y_test))
我尝试将“y”更改为“y.values.ravel()”,但也没有用。我知道这个错误经常出现在 Numpy 数组中,但是是什么导致了这段代码的问题?
提前致谢。
你的错误在下面代码的train_test_split
函数中。
X_train , y_train, X_test, y_test = tts(transformed_X,y,test_size=0.2)
虽然您可能没有注意到,但您已经交换了变量 y_train
和 X_test
.
改用这个:
X_train , X_test, y_train, y_test = tts(transformed_X,y,test_size=0.2)
在
阅读完整文档
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
我正在使用 Udemy 课程中的示例数据进行培训。数据中有 51 行,我正在尝试打印模型的分数。 我得到的错误是:
ValueError: Found input variables with inconsistent numbers of samples: [40, 10]
我理解 [40,10] 是指训练和测试,因为我将 test_size 设置为“0.2”。
代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts
data = pd.read_csv("50_Startups.csv")
X = data.drop("Profit",axis = 1)
y = data[["Profit"]]
from sklearn.preprocessing import OneHotEncoder
cat = ["State"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)
print(transformed_X)
from sklearn.ensemble import RandomForestRegressor as RFR
model = RFR()
X_train , y_train, X_test , y_test = tts(transformed_X,y,test_size=0.2)
model.fit(X_train,y_train)
print(model.score(X_test,y_test))
我尝试将“y”更改为“y.values.ravel()”,但也没有用。我知道这个错误经常出现在 Numpy 数组中,但是是什么导致了这段代码的问题?
提前致谢。
你的错误在下面代码的train_test_split
函数中。
X_train , y_train, X_test, y_test = tts(transformed_X,y,test_size=0.2)
虽然您可能没有注意到,但您已经交换了变量 y_train
和 X_test
.
改用这个:
X_train , X_test, y_train, y_test = tts(transformed_X,y,test_size=0.2)
在
阅读完整文档https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html