用于非整数计数的泊松 GLM - R
Poisson GLM for non-integer counts - R
我希望得到一些关于 Poisson 族的 GLM 的建议。
我有一个数据集,其中包含每个人在一段时间内被咬的次数。由于观察到的个体在不同的时间段进食,当我计算每个个体的咬合率为 bites/minute 时,我得到的是非整数。现在,根据我目前所读的内容,我应该仍然可以对泊松族进行 GLM。但是,我遇到了错误,我认为这可能是因为 R 不喜欢我使用非整数这一事实。有人有建议吗?
Example <- structure(list(Species = c("Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7"), Site = c(1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), Bite_Rate = c(3.5, 7.5,
0, 0, 2.45, 5.5, 6.5, 6.5, 7.5, 8.03, 32.1, 15.6, 18.2, 19.1,
20.5, 20.5, 3.5, 5.7, 6.7, 23.2, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L), spec = structure(list(
cols = list(Species = structure(list(), class = c("collector_character",
"collector")), Site = structure(list(), class = c("collector_double",
"collector")), Bite_Rate = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
str(Example) # check structure
Example$Species<-as.factor(Example$Species) # set species as a factor
str(Example) # check structure
glm<-glm(Species~Bite_Rate, data=Example, family = poisson) # create the GLM
我运行GLM时的错误信息是:
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors
我实际上没有任何负值,这让我有点失望。
任何建议将不胜感激!
编辑:
根据评论,我更新了我的示例数据,以便它具有咬合计数和以秒为单位的观察时间
Example <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("Fish1",
"Fish2", "Fish3", "Fish4", "Fish5", "Fish6", "Fish7"), class = "factor"),
Site = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 3, 3), Bites = c(0, 10, 18, 17, 6, 0, 1, 0, 19,
12, 7, 3, 5, 1, 5, 0, 10, 18, 17, 7, 25), Observed_Seconds = c(50,
33, 47, 20, 17, 10, 14, 21, 48, 10, 50, 33, 47, 20, 17, 10,
14, 21, 48, 10, 90)), row.names = c(NA, -21L), spec = structure(list(
cols = list(Species = structure(list(), class = c("collector_character",
"collector")), Site = structure(list(), class = c("collector_double",
"collector")), Bites = structure(list(), class = c("collector_double",
"collector")), Observed_Seconds = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
谢谢!
您需要获得标准化的秒数(分母)和实际咬合数(计数)。
接下来包括分钟作为偏移量,注意,您的响应变量在 ~ 的左侧:
fit = glm(Bites ~ Species,offset=log(Observed_Seconds),
family=poisson,data=Example)
大家可以看看总结:
summary(fit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
SpeciesFish1 -2.8679 0.4472 -6.413 1.42e-10 ***
SpeciesFish2 -1.1436 0.1857 -6.158 7.35e-10 ***
SpeciesFish3 -0.5738 0.1581 -3.629 0.000284 ***
SpeciesFish4 -0.7732 0.1543 -5.011 5.42e-07 ***
SpeciesFish5 -1.3269 0.1961 -6.766 1.33e-11 ***
SpeciesFish6 -1.7198 0.2887 -5.958 2.56e-09 ***
SpeciesFish7 -1.5244 0.1925 -7.921 2.35e-15 ***
看起来很重要,但最好检查一下数据是否过于分散,还包括其他因素(例如站点):
fit_quasi = glm(Bites ~ Species + factor(Site),offset=log(Observed_Seconds),
family=quasipoisson,data=Example)
summary(fit_quasi)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.9754 1.0713 -2.777 0.0167 *
SpeciesFish2 1.8487 1.1434 1.617 0.1319
SpeciesFish3 2.2731 1.1152 2.038 0.0642 .
SpeciesFish4 2.1246 1.1205 1.896 0.0823 .
SpeciesFish5 1.3533 1.1604 1.166 0.2662
SpeciesFish6 1.2754 1.2658 1.008 0.3336
SpeciesFish7 0.8922 1.1719 0.761 0.4612
factor(Site)2 -0.2325 0.5132 -0.453 0.6587
factor(Site)3 0.6118 0.4677 1.308 0.2154
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 5.521045)
如果它服从泊松分布,则色散将在 1 左右,但在这种情况下,它是过度分散的。
我希望得到一些关于 Poisson 族的 GLM 的建议。
我有一个数据集,其中包含每个人在一段时间内被咬的次数。由于观察到的个体在不同的时间段进食,当我计算每个个体的咬合率为 bites/minute 时,我得到的是非整数。现在,根据我目前所读的内容,我应该仍然可以对泊松族进行 GLM。但是,我遇到了错误,我认为这可能是因为 R 不喜欢我使用非整数这一事实。有人有建议吗?
Example <- structure(list(Species = c("Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4",
"Fish5", "Fish6", "Fish7"), Site = c(1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), Bite_Rate = c(3.5, 7.5,
0, 0, 2.45, 5.5, 6.5, 6.5, 7.5, 8.03, 32.1, 15.6, 18.2, 19.1,
20.5, 20.5, 3.5, 5.7, 6.7, 23.2, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L), spec = structure(list(
cols = list(Species = structure(list(), class = c("collector_character",
"collector")), Site = structure(list(), class = c("collector_double",
"collector")), Bite_Rate = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
str(Example) # check structure
Example$Species<-as.factor(Example$Species) # set species as a factor
str(Example) # check structure
glm<-glm(Species~Bite_Rate, data=Example, family = poisson) # create the GLM
我运行GLM时的错误信息是:
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors
我实际上没有任何负值,这让我有点失望。 任何建议将不胜感激!
编辑: 根据评论,我更新了我的示例数据,以便它具有咬合计数和以秒为单位的观察时间
Example <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("Fish1",
"Fish2", "Fish3", "Fish4", "Fish5", "Fish6", "Fish7"), class = "factor"),
Site = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 3, 3), Bites = c(0, 10, 18, 17, 6, 0, 1, 0, 19,
12, 7, 3, 5, 1, 5, 0, 10, 18, 17, 7, 25), Observed_Seconds = c(50,
33, 47, 20, 17, 10, 14, 21, 48, 10, 50, 33, 47, 20, 17, 10,
14, 21, 48, 10, 90)), row.names = c(NA, -21L), spec = structure(list(
cols = list(Species = structure(list(), class = c("collector_character",
"collector")), Site = structure(list(), class = c("collector_double",
"collector")), Bites = structure(list(), class = c("collector_double",
"collector")), Observed_Seconds = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
谢谢!
您需要获得标准化的秒数(分母)和实际咬合数(计数)。
接下来包括分钟作为偏移量,注意,您的响应变量在 ~ 的左侧:
fit = glm(Bites ~ Species,offset=log(Observed_Seconds),
family=poisson,data=Example)
大家可以看看总结:
summary(fit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
SpeciesFish1 -2.8679 0.4472 -6.413 1.42e-10 ***
SpeciesFish2 -1.1436 0.1857 -6.158 7.35e-10 ***
SpeciesFish3 -0.5738 0.1581 -3.629 0.000284 ***
SpeciesFish4 -0.7732 0.1543 -5.011 5.42e-07 ***
SpeciesFish5 -1.3269 0.1961 -6.766 1.33e-11 ***
SpeciesFish6 -1.7198 0.2887 -5.958 2.56e-09 ***
SpeciesFish7 -1.5244 0.1925 -7.921 2.35e-15 ***
看起来很重要,但最好检查一下数据是否过于分散,还包括其他因素(例如站点):
fit_quasi = glm(Bites ~ Species + factor(Site),offset=log(Observed_Seconds),
family=quasipoisson,data=Example)
summary(fit_quasi)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.9754 1.0713 -2.777 0.0167 *
SpeciesFish2 1.8487 1.1434 1.617 0.1319
SpeciesFish3 2.2731 1.1152 2.038 0.0642 .
SpeciesFish4 2.1246 1.1205 1.896 0.0823 .
SpeciesFish5 1.3533 1.1604 1.166 0.2662
SpeciesFish6 1.2754 1.2658 1.008 0.3336
SpeciesFish7 0.8922 1.1719 0.761 0.4612
factor(Site)2 -0.2325 0.5132 -0.453 0.6587
factor(Site)3 0.6118 0.4677 1.308 0.2154
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 5.521045)
如果它服从泊松分布,则色散将在 1 左右,但在这种情况下,它是过度分散的。