使用 OLS 回归预测未来值(Python、StatsModels、Pandas)
Predicting out future values using OLS regression (Python, StatsModels, Pandas)
我目前正在尝试在 Python 中实施 MLR,但不确定如何将我发现的系数应用于未来值。
import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2
TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV,
'Radio': Radio,
'Newspaper': Newspaper,
'Sales': Sales})
Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const -0.141990
TV 0.070544
Radio 0.239617
Newspaper -0.040178
dtype: float64
假设我想为以下 DataFrame 预测 "sales":
EDIT
TV Radio Newspaper Sales
230.1 37,8 69.2 22.4
44.5 39.3 45.1 10.1
... ... ... ...
25 15 15
30 20 22
35 22 36
我一直在尝试我在这里找到的方法,但我似乎无法让它工作:Forecasting using Pandas OLS
谢谢!
假设 df2 是您新的样本外数据帧:
model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
y_predict = model.predict(new_x)
>>> y_predict
array([ 4.61319034, 5.88274588, 6.15220225])
您可以将结果直接分配给 df2,如下所示:
df2.loc[:, 'Sales'] = model.predict(new_x)
要用回归预测填充原始 DataFrame 中缺失的销售值,请尝试:
X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales
model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)
我目前正在尝试在 Python 中实施 MLR,但不确定如何将我发现的系数应用于未来值。
import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2
TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV,
'Radio': Radio,
'Newspaper': Newspaper,
'Sales': Sales})
Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const -0.141990
TV 0.070544
Radio 0.239617
Newspaper -0.040178
dtype: float64
假设我想为以下 DataFrame 预测 "sales":
EDIT
TV Radio Newspaper Sales
230.1 37,8 69.2 22.4
44.5 39.3 45.1 10.1
... ... ... ...
25 15 15
30 20 22
35 22 36
我一直在尝试我在这里找到的方法,但我似乎无法让它工作:Forecasting using Pandas OLS
谢谢!
假设 df2 是您新的样本外数据帧:
model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
y_predict = model.predict(new_x)
>>> y_predict
array([ 4.61319034, 5.88274588, 6.15220225])
您可以将结果直接分配给 df2,如下所示:
df2.loc[:, 'Sales'] = model.predict(new_x)
要用回归预测填充原始 DataFrame 中缺失的销售值,请尝试:
X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales
model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)