泊松回归中的多项式回归

Polynomial Regression in Poisson Regression

我想创建一个模型,它从多项式回归中获取值并根据预测值(通过多项式)创建泊松回归。

我只得到了类似这样的 R 代码

glm(y ~ poly(x, 6), family = Poisson, data = data_set)

到目前为止,我的方法是首先计算离散数据点的多项式预测,并基于此 运行 泊松回归。但是,我的结果有点不对。

import pandas as pd
from statsmodels.formula.api import glm
polynomial_model = np.poly1d(np.polyfit(x=bin_mids, y=count_per_bin, deg = 6))
model_values = [polynomial_model(i) for i in bin_mids]
df_1["model_values"] = model_values
poisson_model = glm("df_1['count_per_bin'] ~ df_1['model_values']" , data = df_1 ,family = sm.families.Poisson()).fit()

如果你们中有人看到我的错误,我很想知道我哪里出错了。

干杯

不太确定您在对装箱和其余代码进行处理。您在 R 中所做的是 orthogonal polynomials. You can see this 的回归,以获取有关其使用原因的更多信息。

如果您使用的是统计模型,则只需为输入矩阵提供 x、x^2、x^3 的转换值,然后执行 QR 分解,同样 。您可以使用 sklearn 获取矩阵:

import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})

xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]

model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()

                     Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            GLM   Df Residuals:                       43
Model Family:                 Poisson   Df Model:                            6
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -100.45
Date:                Thu, 04 Mar 2021   Deviance:                       33.846
Time:                        22:32:57   Pearson chi2:                     31.8
No. Iterations:                     4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.5458      0.066     23.495      0.000       1.417       1.675
x1            -0.2059      0.439     -0.469      0.639      -1.067       0.655
x2             0.8169      0.466      1.754      0.079      -0.096       1.730
x3             0.1178      0.442      0.267      0.790      -0.748       0.984
x4            -0.5503      0.454     -1.212      0.226      -1.440       0.340
x5             0.0035      0.457      0.008      0.994      -0.892       0.899
x6            -0.6878      0.455     -1.512      0.131      -1.579       0.204
==============================================================================

我们将数据写入csv

df.to_csv("data.csv")

并使用 R 进行拟合,您可以看到我们得到了相同的系数:

df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))

Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.42648  -0.73970  -0.04625   0.61351   1.81234  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.545808   0.065794  23.495   <2e-16 ***
poly(x, 6)1 -0.205940   0.439457  -0.469   0.6393    
poly(x, 6)2  0.816910   0.465770   1.754   0.0794 .  
poly(x, 6)3  0.117815   0.441847   0.267   0.7897    
poly(x, 6)4 -0.550271   0.454099  -1.212   0.2256    
poly(x, 6)5 -0.003508   0.456691  -0.008   0.9939    
poly(x, 6)6  0.687751   0.454953   1.512   0.1306    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 40.443  on 49  degrees of freedom
Residual deviance: 33.846  on 43  degrees of freedom
AIC: 214.91