在 python 中拟合简单线性回归的 R 的 lm 函数等效于什么？

Question

我有一个 CSV 格式的数据集，我已将其存储到 pandas 数据框中。我知道使用 R 的 lm 函数，我可以得到以下结果：

lm.fit=lm(response~predictor1 ,data=my_dataset)
summary(lm.fit)

通过运行上面的命令我得到了类似于下面提到的结果：

Call:
lm(formula = response ~ predictor1, data = my_dataset)
Residuals:
Min 1Q Median 3Q Max
-1.519533 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.55384 0.56263 61.41 <2e-16 ***
lstat -0.95005 0.03873 -24.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared: 0.5441,Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16

我只想将此代码移动到 python，我已经尝试了以下操作：

from sklearn.linear_model import LinearRegression
X = dataset.iloc[:, 12].values.reshape(-1, 1)  # values converts it into a numpy array
Y = dataset.iloc[:, 13].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class
linear_regressor.fit(X, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
#print(Y_pred.describe())
df = pd.DataFrame(Y_pred, columns = ['Column_A'])
print(df.describe())

产生以下结果，但这些不是我想要的。

       Column_A
count  506.000000
mean    22.532806
std      6.784361
min     -1.519533
25%     18.445754
50%     23.761280
75%     27.950998
max     32.910255

是否有另一种方法可以使用 python 和 pandas 数据框来拟合线性回归？

Answer 1

使用 statsmodels 的 OLS 实现及其 .summary 属性，不要忘记使用 add_constant 手动添加常量，因为默认情况下不添加常量。

import statsmodels.api as sm

reg = sm.OLS(y, sm.add_constant(X)).fit()
reg.summary

在 python 中拟合简单线性回归的 R 的 lm 函数等效于什么？

What is the equivalent of R's lm function for fitting simple linear regressions in python?

python

regression

r

linear-regression

pandas