如何在数据框中添加预测值？

Question

我将预测扩展到五个值。现在，我想添加新的五个预测值（New_Interest_Rate 和 New_Unemployment_Rate），这样我就可以将它们与原始时间序列一起绘制在一个新图中。

import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm

Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
                }

df = pd.DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])

X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price']
 
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

# prediction with sklearn
New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
for i in range(len(New_Interest_Rate)):
    print (str(i+1) + ' - Predicted Stock Index Price: \n', 
           regr.predict([[New_Interest_Rate[i] ,New_Unemployment_Rate[i]]]))

# with statsmodels
X = sm.add_constant(X) # adding a constant

model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 
 
print_model = model.summary()
print(print_model)

我不知道如何附加它，因为当我尝试时，出现错误。

Interest_Rate=Interest_Rate.append(New_Interest_Rate)

TypeError: cannot concatenate object of type "<class 'float'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

我的目标是绘制扩展预测值。我使用 jupyter 笔记本。原代码来自此link。谢谢！

Answer 1

运行您提供的代码似乎可以在我的计算机上运行，但有一些警告消息。我使用的版本是 python 3.9.7、pandas 1.3.3-1、sklearn-pandas 2.2.0-1 和 statsmodels 0.13.0。我只是将它保存到一个文件中，并在带有“python copypastedcode.py”的终端中运行。我得到了这个输出：

Intercept:
 1798.4039776258544
Coefficients:
 [ 345.54008701 -250.14657137]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(
1 - Predicted Stock Index Price:
 [1422.86238865]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(
2 - Predicted Stock Index Price:
 [1834.43795318]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(
3 - Predicted Stock Index Price:
 [2430.12461156]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(
4 - Predicted Stock Index Price:
 [1643.6509219]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(
5 - Predicted Stock Index Price:
 [2239.33758028]
                            OLS Regression Results
==============================================================================
Dep. Variable:      Stock_Index_Price   R-squared:                       0.898
Model:                            OLS   Adj. R-squared:                  0.888
Method:                 Least Squares   F-statistic:                     92.07
Date:                Wed, 20 Oct 2021   Prob (F-statistic):           4.04e-11
Time:                        09:07:19   Log-Likelihood:                -134.61
No. Observations:                  24   AIC:                             275.2
Df Residuals:                      21   BIC:                             278.8
Df Model:                           2
Covariance Type:            nonrobust
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1798.4040    899.248      2.000      0.059     -71.685    3668.493
Interest_Rate       345.5401    111.367      3.103      0.005     113.940     577.140
Unemployment_Rate  -250.1466    117.950     -2.121      0.046    -495.437      -4.856
==============================================================================
Omnibus:                        2.691   Durbin-Watson:                   0.530
Prob(Omnibus):                  0.260   Jarque-Bera (JB):                1.551
Skew:                          -0.612   Prob(JB):                        0.461
Kurtosis:                       3.226   Cond. No.                         394.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

“X 没有有效的特征名称...”警告可以通过更改

来修复

regr.fit(X,Y)

至

regr.fit(X.values, Y.values)

如果您想使用 New_Interest_rate 和 New_Unemployment_Rate 来创建回归，那么您需要 Y 有 5 个以上对应的股票价格。如果你试图根据利率和失业率来预测股票价格，我认为这不是你想要做的。不过，您可以这样做：

New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
New_Stock_Prices = [1,2,3,4,5]
X_new = pd.DataFrame(data={'Interest_Rate': New_Interest_Rate,'Unemployment_Rate': New_Unemployment_Rate})
Y_new = pd.DataFrame(data={'Stock_Index_Price': New_Stock_Prices})
regr = linear_model.LinearRegression()
X = X.append(X_df)
Y = Y.append(Y_df)
regr.fit(X.values, Y.values)

如果你想绘制图表，你可以创建一个小函数来从输入数组中获取股票预测，如下所示：

def predict_stock_price(future_interest_rate, future_unemployment_rate):
    return [regr.predict([[i ,j]])[0,0] for i,j in zip(future_interest_rate,future_unemployment_rate)]

prices = predict_stock_price(New_Interest_Rate,New_Unemployment_Rate)
print("list of predicted stock prices:",prices)

predicted_stock_market = {'Month': range(13,13+len(prices)), #just to have a time axis to plot with
                         'Interest_Rate': New_Interest_Rate,
                         'Unemployment_Rate': New_Unemployment_Rate,
                         'Stock_Index_Price': prices}
predicted_df = pd.DataFrame(predicted_stock_market)
predicted_df.plot( x="Month",y="Stock_Index_Price",kind='scatter')
plt.show()

如何在数据框中添加预测值？

How to add predicted values in a dataframe?

regression

append

prediction

jupyter-notebook