泊松回归中的多项式回归
Polynomial Regression in Poisson Regression
我想创建一个模型,它从多项式回归中获取值并根据预测值(通过多项式)创建泊松回归。
我只得到了类似这样的 R 代码
glm(y ~ poly(x, 6), family = Poisson, data = data_set)
到目前为止,我的方法是首先计算离散数据点的多项式预测,并基于此 运行 泊松回归。但是,我的结果有点不对。
import pandas as pd
from statsmodels.formula.api import glm
polynomial_model = np.poly1d(np.polyfit(x=bin_mids, y=count_per_bin, deg = 6))
model_values = [polynomial_model(i) for i in bin_mids]
df_1["model_values"] = model_values
poisson_model = glm("df_1['count_per_bin'] ~ df_1['model_values']" , data = df_1 ,family = sm.families.Poisson()).fit()
如果你们中有人看到我的错误,我很想知道我哪里出错了。
干杯
不太确定您在对装箱和其余代码进行处理。您在 R 中所做的是 orthogonal polynomials. You can see this 的回归,以获取有关其使用原因的更多信息。
如果您使用的是统计模型,则只需为输入矩阵提供 x、x^2、x^3 的转换值,然后执行 QR 分解,同样 。您可以使用 sklearn 获取矩阵:
import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})
xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]
model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: GLM Df Residuals: 43
Model Family: Poisson Df Model: 6
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -100.45
Date: Thu, 04 Mar 2021 Deviance: 33.846
Time: 22:32:57 Pearson chi2: 31.8
No. Iterations: 4
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.5458 0.066 23.495 0.000 1.417 1.675
x1 -0.2059 0.439 -0.469 0.639 -1.067 0.655
x2 0.8169 0.466 1.754 0.079 -0.096 1.730
x3 0.1178 0.442 0.267 0.790 -0.748 0.984
x4 -0.5503 0.454 -1.212 0.226 -1.440 0.340
x5 0.0035 0.457 0.008 0.994 -0.892 0.899
x6 -0.6878 0.455 -1.512 0.131 -1.579 0.204
==============================================================================
我们将数据写入csv
df.to_csv("data.csv")
并使用 R 进行拟合,您可以看到我们得到了相同的系数:
df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))
Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.42648 -0.73970 -0.04625 0.61351 1.81234
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.545808 0.065794 23.495 <2e-16 ***
poly(x, 6)1 -0.205940 0.439457 -0.469 0.6393
poly(x, 6)2 0.816910 0.465770 1.754 0.0794 .
poly(x, 6)3 0.117815 0.441847 0.267 0.7897
poly(x, 6)4 -0.550271 0.454099 -1.212 0.2256
poly(x, 6)5 -0.003508 0.456691 -0.008 0.9939
poly(x, 6)6 0.687751 0.454953 1.512 0.1306
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 40.443 on 49 degrees of freedom
Residual deviance: 33.846 on 43 degrees of freedom
AIC: 214.91
我想创建一个模型,它从多项式回归中获取值并根据预测值(通过多项式)创建泊松回归。
我只得到了类似这样的 R 代码
glm(y ~ poly(x, 6), family = Poisson, data = data_set)
到目前为止,我的方法是首先计算离散数据点的多项式预测,并基于此 运行 泊松回归。但是,我的结果有点不对。
import pandas as pd
from statsmodels.formula.api import glm
polynomial_model = np.poly1d(np.polyfit(x=bin_mids, y=count_per_bin, deg = 6))
model_values = [polynomial_model(i) for i in bin_mids]
df_1["model_values"] = model_values
poisson_model = glm("df_1['count_per_bin'] ~ df_1['model_values']" , data = df_1 ,family = sm.families.Poisson()).fit()
如果你们中有人看到我的错误,我很想知道我哪里出错了。
干杯
不太确定您在对装箱和其余代码进行处理。您在 R 中所做的是 orthogonal polynomials. You can see this 的回归,以获取有关其使用原因的更多信息。
如果您使用的是统计模型,则只需为输入矩阵提供 x、x^2、x^3 的转换值,然后执行 QR 分解,同样
import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})
xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]
model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: GLM Df Residuals: 43
Model Family: Poisson Df Model: 6
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -100.45
Date: Thu, 04 Mar 2021 Deviance: 33.846
Time: 22:32:57 Pearson chi2: 31.8
No. Iterations: 4
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.5458 0.066 23.495 0.000 1.417 1.675
x1 -0.2059 0.439 -0.469 0.639 -1.067 0.655
x2 0.8169 0.466 1.754 0.079 -0.096 1.730
x3 0.1178 0.442 0.267 0.790 -0.748 0.984
x4 -0.5503 0.454 -1.212 0.226 -1.440 0.340
x5 0.0035 0.457 0.008 0.994 -0.892 0.899
x6 -0.6878 0.455 -1.512 0.131 -1.579 0.204
==============================================================================
我们将数据写入csv
df.to_csv("data.csv")
并使用 R 进行拟合,您可以看到我们得到了相同的系数:
df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))
Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.42648 -0.73970 -0.04625 0.61351 1.81234
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.545808 0.065794 23.495 <2e-16 ***
poly(x, 6)1 -0.205940 0.439457 -0.469 0.6393
poly(x, 6)2 0.816910 0.465770 1.754 0.0794 .
poly(x, 6)3 0.117815 0.441847 0.267 0.7897
poly(x, 6)4 -0.550271 0.454099 -1.212 0.2256
poly(x, 6)5 -0.003508 0.456691 -0.008 0.9939
poly(x, 6)6 0.687751 0.454953 1.512 0.1306
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 40.443 on 49 degrees of freedom
Residual deviance: 33.846 on 43 degrees of freedom
AIC: 214.91