How to use the Predict Function in R after manually altering a GLM's coefficients

##data frame
df <-data.frame(Account =c("A","B","C","D","E","F","G","H"), 
       Exposure = c(1,50,67,85,250,25,22,89),
       Freq= c(.008,.5,.05,.34,.7,0,.04,.12),
       Losses = c(100000,100,2500,100000,25000,0,7500,5200),
       LossPerUnit = c(100000,100,2500,100000,25000,0,7500,5200)/c(1,50,67,85,250,25,22,89))

##Variables for modeling
ModelingVars <- as.formula(df$LossPerUnit~df$JudicialOrientation+df$Freq)

##Tweedie GLM
Model <- glm(ModelingVars, family=tweedie(var.power=1.5, link.power = 0),
             weight = Exposure, data = df)

##Predict Losses with Model coefficients
df$PredictedLossPerUnit <- predict(Model,df, type="response")

##Manually edit a coefficient for one of my categorical variable's levels
Model$coefficients["df$JudicialOrientationNeutral"] <-log(50)

##Predict Losses again to compare
df$PredictedLossPerUnit2 <- predict(Model, df, type ="response")


您使用公式的方式脱离了 df 对象的含义,或者混淆了 predict.lm 某些东西 的逻辑。如果您改为 运行 按照预期使用的方式创建公式(不引用数据对象的名称(因此仅使用列名),您将获得所需的效果:

 ModelingVars <- as.formula(LossPerUnit~JudicialOrientation+Freq)


> df$PredictedLossPerUnit <- predict(Model,df, type="response")
> ##Manually edit a coefficient for one of my categorical variable's levels
> Model$coefficients["JudicialOrientationNeutral"] <-log(50)
> ##Predict Losses again to compare
> df$PredictedLossPerUnit2 <- predict(Model, df, type ="response")
> df
  Account Exposure JudicialOrientation  Freq Losses  LossPerUnit PredictedLossPerUnit PredictedLossPerUnit2
1       A        1             Neutral 0.008 100000 100000.00000           1549.56677           40213.38196
2       B       50             Neutral 0.500    100      2.00000            919.41825           23860.16405
3       C       67           Plaintiff 0.050   2500     37.31343            169.99221             169.99221
4       D       85             Defense 0.340 100000   1176.47059            565.49150             565.49150
5       E      250           Plaintiff 0.700  25000    100.00000             85.29641              85.29641
6       F       25             Neutral 0.000      0      0.00000           1562.77490           40556.15105
7       G       22           Plaintiff 0.040   7500    340.90909            171.80535             171.80535
8       H       89             Defense 0.120   5200     58.42697            714.15870             714.15870

我通常尽量在屏幕上保留必要的 material,但在这里您需要滚动才能看到两列中的 "Neutral" 项不同。

编辑:我将公式的创建留在外面,因为它是可能的最小变化,但更好的策略是只使用你的公式而不使用 "as.formula" 包装器,这是不需要的并且将有一个不同的环境供以后评估。首先 运行: Model <- glm(LossPerUnit~JudicialOrientation+Freq, family = tweedie(var.power=1.5, link.power = 0), weight = Exposure, data = df) 然后做你的系数暴力。


df$PredictedLossPerUnit <- predict(Model,data=df, type="response")

"data" 实际上不是预测函数的函数参数,它应该是 "newdata"。一个愚蠢的错误,但却是一个很好的教训。感谢所有帮助。