我不确定我的数据中需要重塑什么

Question

我正在尝试使用 LinearRegression() 算法来预测房屋价格。

这是我的代码：

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = pd.read_csv('data.csv')
df = df.drop(columns=['date', 'street', 'city', 'statezip', 'country'])

X = df.drop(columns=['price'])
y = df['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

lr = LinearRegression()
lr.fit(X_train, y_train)
pred = lr.predict(X_test)
pred.reshape((-1, 1))
acc = lr.score(pred, y_test)

但是，我不断收到此错误：

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

我已经尝试重塑数据中的所有属性，但我唯一能够重塑的是 pred，但这样做后我仍然遇到同样的错误？

我该如何解决这个错误？

提前致谢。

Answer 1

基于 sklearn.linear_model.LinearRegression.score:

的文档

score(X, y, sample_weight=None)

return R^2 score of self.predict(X) wrt. y.

您需要将 X 作为第一个参数传递，如下所示：

lr.fit(X_train, y_train)
acc = lr.score(X_test, y_test)
print(acc)

或者您可以使用 sklearn.metrics.r2_score:

from sklearn.metrics import r2_score
acc = r2_score(y_test, pred)
print(acc)

示例：

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)
pred = lr.predict(X_test)
acc = lr.score(X_test, y_test)
print(acc)
# Or
from sklearn.metrics import r2_score
acc = r2_score(y_test, pred)
print(acc)

输出：

0.8888888888888888
0.8888888888888888

我不确定我的数据中需要重塑什么

I'm not sure what needs to be reshaped in my data

python

numpy

machine-learning

linear-regression

pandas