使用 OLS 回归预测未来值（Python、StatsModels、Pandas）

Question

我目前正在尝试在 Python 中实施 MLR，但不确定如何将我发现的系数应用于未来值。

import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2

TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV, 
                   'Radio': Radio, 
                   'Newspaper': Newspaper, 
                   'Sales': Sales})

Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const       -0.141990
TV           0.070544
Radio        0.239617
Newspaper   -0.040178
dtype: float64

假设我想为以下 DataFrame 预测 "sales"：

EDIT

TV     Radio    Newspaper    Sales
230.1  37,8       69.2       22.4
44.5   39.3       45.1       10.1
...    ...        ...        ...
25      15        15
30      20        22
35      22        36

我一直在尝试我在这里找到的方法，但我似乎无法让它工作：Forecasting using Pandas OLS

谢谢！

Answer 1

假设 df2 是您新的样本外数据帧：

model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api
y_predict = model.predict(new_x)

>>> y_predict
array([ 4.61319034,  5.88274588,  6.15220225])

您可以将结果直接分配给 df2，如下所示：

df2.loc[:, 'Sales'] = model.predict(new_x)

要用回归预测填充原始 DataFrame 中缺失的销售值，请尝试：

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales

model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)

使用 OLS 回归预测未来值（Python、StatsModels、Pandas）

Predicting out future values using OLS regression (Python, StatsModels, Pandas)

python

pandas

statsmodels