线性回归 train/shape 输出不正确

Question

我正在尝试使用线性回归来预测未来几年的节目发行量。我有一个数据框，其中每一行都是一个版本，其中包含发布年份、流派等信息的列......我想用它来预测即将发布的版本的数量，所以我所做的就是制作一个新的数据框所有唯一年份的总和 count_values 以获得当年的发行量。所以现在我有 85 行，其中 2 列 1 是年份，另一列是发行量。

我正在为此唱 sklearn，这是我到目前为止编写的代码。

x = ML_content.drop('releases', axis = 1)
#x = ML_content['years']
y = ML_content['releases']
x_train, y_train, x_test, y_test = train_test_split(x, y, test_size = 20)
x_train.shape, y_train.shape
model = linear_model.LinearRegression()
model.fit(x_train, y_train)

形状过程的结果我认为不符合我的要求（这是结果：((42, 1), (43, 1))）因此，以下代码也将不起作用。任何人都可以向我解释我做错了什么或需要发生什么来改变它。

感谢您的宝贵时间和帮助

Answer 1

根据https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
train_test_split 的 return 与您的顺序不同。
returned 顺序为：X_train、X_test、y_train、y_test
你得到了：x_train、y_train、x_test、y_test

线性回归 train/shape 输出不正确

linear regression train/shape output not correct

python

database

linear-regression

sklearn-pandas

jupyter-notebook