如何在数据框中添加预测值?
How to add predicted values in a dataframe?
我将预测扩展到五个值 。现在,我想添加新的五个预测值(New_Interest_Rate 和 New_Unemployment_Rate),这样我就可以将它们与原始时间序列一起绘制在一个新图中。
import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]
}
df = pd.DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])
X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price']
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# prediction with sklearn
New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
for i in range(len(New_Interest_Rate)):
print (str(i+1) + ' - Predicted Stock Index Price: \n',
regr.predict([[New_Interest_Rate[i] ,New_Unemployment_Rate[i]]]))
# with statsmodels
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)
我不知道如何附加它,因为当我尝试时,出现错误。
Interest_Rate=Interest_Rate.append(New_Interest_Rate)
TypeError: cannot concatenate object of type "<class 'float'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
我的目标是绘制扩展预测值。我使用 jupyter 笔记本。原代码来自此link。谢谢!
运行 您提供的代码似乎可以在我的计算机上运行,但有一些警告消息。我使用的版本是 python 3.9.7、pandas 1.3.3-1、sklearn-pandas 2.2.0-1 和 statsmodels 0.13.0。我只是将它保存到一个文件中,并在带有“python copypastedcode.py”的终端中 运行。我得到了这个输出:
Intercept:
1798.4039776258544
Coefficients:
[ 345.54008701 -250.14657137]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
1 - Predicted Stock Index Price:
[1422.86238865]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
2 - Predicted Stock Index Price:
[1834.43795318]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
3 - Predicted Stock Index Price:
[2430.12461156]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
4 - Predicted Stock Index Price:
[1643.6509219]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
5 - Predicted Stock Index Price:
[2239.33758028]
OLS Regression Results
==============================================================================
Dep. Variable: Stock_Index_Price R-squared: 0.898
Model: OLS Adj. R-squared: 0.888
Method: Least Squares F-statistic: 92.07
Date: Wed, 20 Oct 2021 Prob (F-statistic): 4.04e-11
Time: 09:07:19 Log-Likelihood: -134.61
No. Observations: 24 AIC: 275.2
Df Residuals: 21 BIC: 278.8
Df Model: 2
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
const 1798.4040 899.248 2.000 0.059 -71.685 3668.493
Interest_Rate 345.5401 111.367 3.103 0.005 113.940 577.140
Unemployment_Rate -250.1466 117.950 -2.121 0.046 -495.437 -4.856
==============================================================================
Omnibus: 2.691 Durbin-Watson: 0.530
Prob(Omnibus): 0.260 Jarque-Bera (JB): 1.551
Skew: -0.612 Prob(JB): 0.461
Kurtosis: 3.226 Cond. No. 394.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
“X 没有有效的特征名称...”警告可以通过更改
来修复
regr.fit(X,Y)
至
regr.fit(X.values, Y.values)
如果您想使用 New_Interest_rate 和 New_Unemployment_Rate 来创建回归,那么您需要 Y 有 5 个以上对应的股票价格。如果你试图根据利率和失业率来预测股票价格,我认为这不是你想要做的。不过,您可以这样做:
New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
New_Stock_Prices = [1,2,3,4,5]
X_new = pd.DataFrame(data={'Interest_Rate': New_Interest_Rate,'Unemployment_Rate': New_Unemployment_Rate})
Y_new = pd.DataFrame(data={'Stock_Index_Price': New_Stock_Prices})
regr = linear_model.LinearRegression()
X = X.append(X_df)
Y = Y.append(Y_df)
regr.fit(X.values, Y.values)
如果你想绘制图表,你可以创建一个小函数来从输入数组中获取股票预测,如下所示:
def predict_stock_price(future_interest_rate, future_unemployment_rate):
return [regr.predict([[i ,j]])[0,0] for i,j in zip(future_interest_rate,future_unemployment_rate)]
prices = predict_stock_price(New_Interest_Rate,New_Unemployment_Rate)
print("list of predicted stock prices:",prices)
predicted_stock_market = {'Month': range(13,13+len(prices)), #just to have a time axis to plot with
'Interest_Rate': New_Interest_Rate,
'Unemployment_Rate': New_Unemployment_Rate,
'Stock_Index_Price': prices}
predicted_df = pd.DataFrame(predicted_stock_market)
predicted_df.plot( x="Month",y="Stock_Index_Price",kind='scatter')
plt.show()
我将预测扩展到五个值
import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]
}
df = pd.DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])
X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price']
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# prediction with sklearn
New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
for i in range(len(New_Interest_Rate)):
print (str(i+1) + ' - Predicted Stock Index Price: \n',
regr.predict([[New_Interest_Rate[i] ,New_Unemployment_Rate[i]]]))
# with statsmodels
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)
我不知道如何附加它,因为当我尝试时,出现错误。
Interest_Rate=Interest_Rate.append(New_Interest_Rate)
TypeError: cannot concatenate object of type "<class 'float'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
我的目标是绘制扩展预测值。我使用 jupyter 笔记本。原代码来自此link。谢谢!
运行 您提供的代码似乎可以在我的计算机上运行,但有一些警告消息。我使用的版本是 python 3.9.7、pandas 1.3.3-1、sklearn-pandas 2.2.0-1 和 statsmodels 0.13.0。我只是将它保存到一个文件中,并在带有“python copypastedcode.py”的终端中 运行。我得到了这个输出:
Intercept:
1798.4039776258544
Coefficients:
[ 345.54008701 -250.14657137]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
1 - Predicted Stock Index Price:
[1422.86238865]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
2 - Predicted Stock Index Price:
[1834.43795318]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
3 - Predicted Stock Index Price:
[2430.12461156]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
4 - Predicted Stock Index Price:
[1643.6509219]
/usr/lib/python3.9/site-packages/sklearn/base.py:441: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
5 - Predicted Stock Index Price:
[2239.33758028]
OLS Regression Results
==============================================================================
Dep. Variable: Stock_Index_Price R-squared: 0.898
Model: OLS Adj. R-squared: 0.888
Method: Least Squares F-statistic: 92.07
Date: Wed, 20 Oct 2021 Prob (F-statistic): 4.04e-11
Time: 09:07:19 Log-Likelihood: -134.61
No. Observations: 24 AIC: 275.2
Df Residuals: 21 BIC: 278.8
Df Model: 2
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
const 1798.4040 899.248 2.000 0.059 -71.685 3668.493
Interest_Rate 345.5401 111.367 3.103 0.005 113.940 577.140
Unemployment_Rate -250.1466 117.950 -2.121 0.046 -495.437 -4.856
==============================================================================
Omnibus: 2.691 Durbin-Watson: 0.530
Prob(Omnibus): 0.260 Jarque-Bera (JB): 1.551
Skew: -0.612 Prob(JB): 0.461
Kurtosis: 3.226 Cond. No. 394.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
“X 没有有效的特征名称...”警告可以通过更改
来修复regr.fit(X,Y)
至
regr.fit(X.values, Y.values)
如果您想使用 New_Interest_rate 和 New_Unemployment_Rate 来创建回归,那么您需要 Y 有 5 个以上对应的股票价格。如果你试图根据利率和失业率来预测股票价格,我认为这不是你想要做的。不过,您可以这样做:
New_Interest_Rate = [2.75, 3, 4, 1, 2]
New_Unemployment_Rate = [5.3, 4, 3, 2, 1]
New_Stock_Prices = [1,2,3,4,5]
X_new = pd.DataFrame(data={'Interest_Rate': New_Interest_Rate,'Unemployment_Rate': New_Unemployment_Rate})
Y_new = pd.DataFrame(data={'Stock_Index_Price': New_Stock_Prices})
regr = linear_model.LinearRegression()
X = X.append(X_df)
Y = Y.append(Y_df)
regr.fit(X.values, Y.values)
如果你想绘制图表,你可以创建一个小函数来从输入数组中获取股票预测,如下所示:
def predict_stock_price(future_interest_rate, future_unemployment_rate):
return [regr.predict([[i ,j]])[0,0] for i,j in zip(future_interest_rate,future_unemployment_rate)]
prices = predict_stock_price(New_Interest_Rate,New_Unemployment_Rate)
print("list of predicted stock prices:",prices)
predicted_stock_market = {'Month': range(13,13+len(prices)), #just to have a time axis to plot with
'Interest_Rate': New_Interest_Rate,
'Unemployment_Rate': New_Unemployment_Rate,
'Stock_Index_Price': prices}
predicted_df = pd.DataFrame(predicted_stock_market)
predicted_df.plot( x="Month",y="Stock_Index_Price",kind='scatter')
plt.show()