使用 'predict' 函数进行逻辑回归时出错
Error using the 'predict' function for a logistic regression
我正在尝试拟合多项逻辑回归,然后根据样本预测结果。
### RZS_TC is my dataframe
RZS_TC.loc[RZS_TC['Mean_Treecover'] <= 50, 'Mean_Treecover' ] = 0
RZS_TC.loc[RZS_TC['Mean_Treecover'] > 50, 'Mean_Treecover' ] = 1
RZS_TC[['MAP']+['Sr']+['delTC']+['Mean_Treecover']].head()
[Output]:
MAP Sr delTC Mean_Treecover
302993741 2159.297363 452.975647 2.666672 1.0
217364332 3242.351807 65.615341 8.000000 1.0
390863334 1617.215454 493.124054 5.666666 0.0
446559668 1095.183105 498.373383 -8.000000 0.0
246078364 2804.615234 98.981110 -4.000000 1.0
1000000 rows × 7 columns
#Fitting a logistic regression
from statsmodels.formula.api import mnlogit
model = mnlogit("Mean_Treecover ~ MAP + Sr + delTC", RZS_TC).fit()
print(model.summary2())
[Output]:
Results: MNLogit
====================================================================
Model: MNLogit Pseudo R-squared: 0.364
Dependent Variable: Mean_Treecover AIC: 831092.4595
Date: 2021-04-02 13:51 BIC: 831139.7215
No. Observations: 1000000 Log-Likelihood: -4.1554e+05
Df Model: 3 LL-Null: -6.5347e+05
Df Residuals: 999996 LLR p-value: 0.0000
Converged: 1.0000 Scale: 1.0000
No. Iterations: 7.0000
--------------------------------------------------------------------
Mean_Treecover = 0 Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
Intercept -5.2200 0.0119 -438.4468 0.0000 -5.2434 -5.1967
MAP 0.0023 0.0000 491.0859 0.0000 0.0023 0.0023
Sr 0.0016 0.0000 90.6805 0.0000 0.0015 0.0016
delTC -0.0093 0.0002 -39.9022 0.0000 -0.0098 -0.0089
然而,无论我在哪里尝试使用 model.predict()
函数进行预测,我都会得到 跟随错误 。
prediction = model.predict(np.array(RZS_TC[['MAP']+['Sr']+['delTC']]))
[Output]: ERROR! Session/line number was not unique in database. History logging moved to new session 2627
有谁知道如何解决这个问题?我可能做错了什么吗?
该模型添加了一个截距,因此您需要使用示例数据将其包括在内:
from statsmodels.formula.api import mnlogit
import pandas as pd
import numpy as np
RZS_TC = pd.DataFrame(np.random.uniform(0,1,(20,4)),
columns=['MAP','Sr','delTC','Mean_Treecover'])
RZS_TC['Mean_Treecover'] = round(RZS_TC['Mean_Treecover'])
model = mnlogit("Mean_Treecover ~ MAP + Sr + delTC", RZS_TC).fit()
您可以看到您的拟合数据的维度:
model.model.exog[:5,]
Out[16]:
array([[1. , 0.33914763, 0.79358056, 0.3103758 ],
[1. , 0.45915785, 0.94991271, 0.27203524],
[1. , 0.55527662, 0.15122108, 0.80675951],
[1. , 0.18493681, 0.89854583, 0.66760684],
[1. , 0.38300074, 0.6945397 , 0.28128137]])
这与添加常量相同:
import statsmodels.api as sm
sm.add_constant((RZS_TC[['MAP','Sr','delTC']])
const MAP Sr delTC
0 1.0 0.339148 0.793581 0.310376
1 1.0 0.459158 0.949913 0.272035
2 1.0 0.555277 0.151221 0.806760
3 1.0 0.184937 0.898546 0.667607
如果您有一个具有相同列名的 data.frame,它将只是:
prediction = model.predict(RZS_TC[['MAP','Sr','delTC']])
或者,如果您只需要拟合值,请执行以下操作:
model.fittedvalues
我正在尝试拟合多项逻辑回归,然后根据样本预测结果。
### RZS_TC is my dataframe
RZS_TC.loc[RZS_TC['Mean_Treecover'] <= 50, 'Mean_Treecover' ] = 0
RZS_TC.loc[RZS_TC['Mean_Treecover'] > 50, 'Mean_Treecover' ] = 1
RZS_TC[['MAP']+['Sr']+['delTC']+['Mean_Treecover']].head()
[Output]:
MAP Sr delTC Mean_Treecover
302993741 2159.297363 452.975647 2.666672 1.0
217364332 3242.351807 65.615341 8.000000 1.0
390863334 1617.215454 493.124054 5.666666 0.0
446559668 1095.183105 498.373383 -8.000000 0.0
246078364 2804.615234 98.981110 -4.000000 1.0
1000000 rows × 7 columns
#Fitting a logistic regression
from statsmodels.formula.api import mnlogit
model = mnlogit("Mean_Treecover ~ MAP + Sr + delTC", RZS_TC).fit()
print(model.summary2())
[Output]:
Results: MNLogit
====================================================================
Model: MNLogit Pseudo R-squared: 0.364
Dependent Variable: Mean_Treecover AIC: 831092.4595
Date: 2021-04-02 13:51 BIC: 831139.7215
No. Observations: 1000000 Log-Likelihood: -4.1554e+05
Df Model: 3 LL-Null: -6.5347e+05
Df Residuals: 999996 LLR p-value: 0.0000
Converged: 1.0000 Scale: 1.0000
No. Iterations: 7.0000
--------------------------------------------------------------------
Mean_Treecover = 0 Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
Intercept -5.2200 0.0119 -438.4468 0.0000 -5.2434 -5.1967
MAP 0.0023 0.0000 491.0859 0.0000 0.0023 0.0023
Sr 0.0016 0.0000 90.6805 0.0000 0.0015 0.0016
delTC -0.0093 0.0002 -39.9022 0.0000 -0.0098 -0.0089
然而,无论我在哪里尝试使用 model.predict()
函数进行预测,我都会得到 跟随错误 。
prediction = model.predict(np.array(RZS_TC[['MAP']+['Sr']+['delTC']]))
[Output]: ERROR! Session/line number was not unique in database. History logging moved to new session 2627
有谁知道如何解决这个问题?我可能做错了什么吗?
该模型添加了一个截距,因此您需要使用示例数据将其包括在内:
from statsmodels.formula.api import mnlogit
import pandas as pd
import numpy as np
RZS_TC = pd.DataFrame(np.random.uniform(0,1,(20,4)),
columns=['MAP','Sr','delTC','Mean_Treecover'])
RZS_TC['Mean_Treecover'] = round(RZS_TC['Mean_Treecover'])
model = mnlogit("Mean_Treecover ~ MAP + Sr + delTC", RZS_TC).fit()
您可以看到您的拟合数据的维度:
model.model.exog[:5,]
Out[16]:
array([[1. , 0.33914763, 0.79358056, 0.3103758 ],
[1. , 0.45915785, 0.94991271, 0.27203524],
[1. , 0.55527662, 0.15122108, 0.80675951],
[1. , 0.18493681, 0.89854583, 0.66760684],
[1. , 0.38300074, 0.6945397 , 0.28128137]])
这与添加常量相同:
import statsmodels.api as sm
sm.add_constant((RZS_TC[['MAP','Sr','delTC']])
const MAP Sr delTC
0 1.0 0.339148 0.793581 0.310376
1 1.0 0.459158 0.949913 0.272035
2 1.0 0.555277 0.151221 0.806760
3 1.0 0.184937 0.898546 0.667607
如果您有一个具有相同列名的 data.frame,它将只是:
prediction = model.predict(RZS_TC[['MAP','Sr','delTC']])
或者,如果您只需要拟合值,请执行以下操作:
model.fittedvalues