R 中的多项逻辑回归
Multinomial Logistic Regression in R
我试图找到很多关于多项式回归的资源。但是,我很难找到可以显示给定 multiclass 因变量的 multiclass 响应变量概率的可视化。我在下面有一个示例,我想根据 WEEKDAY 和 COUNT 预测 EQUIPMENT。我如何使用 nnet::multinom() 执行此操作并提供有意义的可视化以帮助理解预测的 classes.
dummy_data <- function(size){
LOCATION <- sample(c("LOC_A", "LOC_B", "LOC_B"), size, replace = T, prob = c(0.4, 0.4, 0.2))
EQUIPMENT <- sample(c("EQUIP_A", "EQUIP_B", "EQUIP_C", "EQUIP_D"), size, replace = TRUE)
df <- data.frame(LOCATION, EQUIPMENT)
df$COUNT <- sample(c(1:10), size, replace = TRUE)
startTime <- as.POSIXct("2016-01-01")
endTime <- as.POSIXct("2019-01-31")
df$DATE <- as.Date(sample(seq(startTime, endTime, 1), size))
df$WEEKDAY <- weekdays(as.Date(df$DATE))
return(df)
}
df<- dummy_data(100)
library(nnet)
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
我应该如何从这里继续?我试着离开这个例子,想要一个类似的可视化来表示按工作日和设备细分的 class 的概率。我很难理解这个例子。谁能帮忙? https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
我认为您在加州大学洛杉矶分校网站示例中寻找的是“查看连续预测变量写入的不同值的平均预测概率(数据中的 COUNT)在 ses 的每个级别(WEEKDAY in你的数据)。"
为此,我将向您在 multinom 之前所做的添加以下行:
df$EQUIPMENT <- as.factor(df$EQUIPMENT)
df$EQUIPMENT <- relevel(df$EQUIPMENT, ref = "EQUIP_A")
df$WEEKDAY <- as.factor(df$WEEKDAY)
df$WEEKDAY <- relevel(df$WEEKDAY, ref = "Sunday")
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
然后按照
的例子
dmodel <- data.frame(WEEKDAY = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), each = 10), COUNT = rep(c(1:10),7))
pp.model <- cbind(dmodel, predict(model, newdata = dmodel, type = "probs", se = TRUE))
lmodel <- melt(pp.model, id.vars = c("WEEKDAY", "COUNT"), value.name = "probability")
ggplot(lmodel, aes(x = COUNT, y = probability, colour = WEEKDAY)) + geom_line() + facet_grid(variable ~., scales = "free")
但是,您可以看到该图看起来与 UCLA 示例有很大不同。为什么?在绘制图之前,我们需要先检查预测变量是否显着。为了简单地做到这一点(而不是遵循示例 link 中的步骤),我将使用 epiDisplay 包:
epiDisplay::mlogit.display(model, decimal=3, alpha=0.05)
Outcome =EQUIPMENT; Referent group = EQUIP_A
EQUIP_B EQUIP_C EQUIP_D
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) -0.083/1.0419 - 0.036/0.9569 - -0.124/1.0174 -
WEEKDAYMonday -15.513/0NA 0(0,0) -0.246/0.9629 0.782(0.118,5.162) -0.672/1.0851 0.511(0.061,4.283)
WEEKDAYSaturday -1.082/1.4201 0.339(0.021,5.482) 0.454/1.0477 1.575(0.202,12.275) 0.536/1.1023 1.708(0.197,14.822)
WEEKDAYSunday -0.269/1.1276 0.764(0.084,6.967) -0.234/1.0488 0.791(0.101,6.182) 0.433/1.05 1.542(0.197,12.076)
WEEKDAYThursday 0.516/1.0964 1.676(0.195,14.372) -1.373/1.3855 0.253(0.017,3.83) 0.296/1.119 1.344(0.15,12.047)
WEEKDAYTuesday 0.01/1.083 1.01(0.121,8.438) -0.548/1.0827 0.578(0.069,4.827) -0.678/1.1925 0.507(0.049,5.253)
WEEKDAYWednesday -1.1/1.1549 0.333(0.035,3.201) -0.477/0.9758 0.621(0.092,4.201) -1.794/1.3542 0.166(0.012,2.364)
COUNT 0.014/0.109 1.014(0.819,1.256) 0.042/0.0961 1.043(0.864,1.259) 0.021/0.1019 1.021(0.836,1.247)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
并且您的模型中没有重要的预测变量。相反,使用示例数据的多项式回归显示预测变量显着:
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
epiDisplay::mlogit.display(test, decimal=3, alpha=0.05)
Outcome =prog2; Referent group = academic
general vocation
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) 2.852/1.1664* - 5.218/1.1636*** -
sesmiddle -0.533/0.4437 0.587(0.246,1.4) 0.291/0.4764 1.338(0.526,3.404)
seshigh -1.163/0.5142* 0.313(0.114,0.856) -0.983/0.5956 0.374(0.116,1.203)
write -0.058/0.0214** 0.944(0.905,0.984) -0.114/0.0222*** 0.893(0.855,0.932)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
我试图找到很多关于多项式回归的资源。但是,我很难找到可以显示给定 multiclass 因变量的 multiclass 响应变量概率的可视化。我在下面有一个示例,我想根据 WEEKDAY 和 COUNT 预测 EQUIPMENT。我如何使用 nnet::multinom() 执行此操作并提供有意义的可视化以帮助理解预测的 classes.
dummy_data <- function(size){
LOCATION <- sample(c("LOC_A", "LOC_B", "LOC_B"), size, replace = T, prob = c(0.4, 0.4, 0.2))
EQUIPMENT <- sample(c("EQUIP_A", "EQUIP_B", "EQUIP_C", "EQUIP_D"), size, replace = TRUE)
df <- data.frame(LOCATION, EQUIPMENT)
df$COUNT <- sample(c(1:10), size, replace = TRUE)
startTime <- as.POSIXct("2016-01-01")
endTime <- as.POSIXct("2019-01-31")
df$DATE <- as.Date(sample(seq(startTime, endTime, 1), size))
df$WEEKDAY <- weekdays(as.Date(df$DATE))
return(df)
}
df<- dummy_data(100)
library(nnet)
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
我应该如何从这里继续?我试着离开这个例子,想要一个类似的可视化来表示按工作日和设备细分的 class 的概率。我很难理解这个例子。谁能帮忙? https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
我认为您在加州大学洛杉矶分校网站示例中寻找的是“查看连续预测变量写入的不同值的平均预测概率(数据中的 COUNT)在 ses 的每个级别(WEEKDAY in你的数据)。"
为此,我将向您在 multinom 之前所做的添加以下行:
df$EQUIPMENT <- as.factor(df$EQUIPMENT)
df$EQUIPMENT <- relevel(df$EQUIPMENT, ref = "EQUIP_A")
df$WEEKDAY <- as.factor(df$WEEKDAY)
df$WEEKDAY <- relevel(df$WEEKDAY, ref = "Sunday")
model <- nnet::multinom(EQUIPMENT ~ WEEKDAY + COUNT, data = df)
然后按照
的例子 dmodel <- data.frame(WEEKDAY = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), each = 10), COUNT = rep(c(1:10),7))
pp.model <- cbind(dmodel, predict(model, newdata = dmodel, type = "probs", se = TRUE))
lmodel <- melt(pp.model, id.vars = c("WEEKDAY", "COUNT"), value.name = "probability")
ggplot(lmodel, aes(x = COUNT, y = probability, colour = WEEKDAY)) + geom_line() + facet_grid(variable ~., scales = "free")
但是,您可以看到该图看起来与 UCLA 示例有很大不同。为什么?在绘制图之前,我们需要先检查预测变量是否显着。为了简单地做到这一点(而不是遵循示例 link 中的步骤),我将使用 epiDisplay 包:
epiDisplay::mlogit.display(model, decimal=3, alpha=0.05)
Outcome =EQUIPMENT; Referent group = EQUIP_A
EQUIP_B EQUIP_C EQUIP_D
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) -0.083/1.0419 - 0.036/0.9569 - -0.124/1.0174 -
WEEKDAYMonday -15.513/0NA 0(0,0) -0.246/0.9629 0.782(0.118,5.162) -0.672/1.0851 0.511(0.061,4.283)
WEEKDAYSaturday -1.082/1.4201 0.339(0.021,5.482) 0.454/1.0477 1.575(0.202,12.275) 0.536/1.1023 1.708(0.197,14.822)
WEEKDAYSunday -0.269/1.1276 0.764(0.084,6.967) -0.234/1.0488 0.791(0.101,6.182) 0.433/1.05 1.542(0.197,12.076)
WEEKDAYThursday 0.516/1.0964 1.676(0.195,14.372) -1.373/1.3855 0.253(0.017,3.83) 0.296/1.119 1.344(0.15,12.047)
WEEKDAYTuesday 0.01/1.083 1.01(0.121,8.438) -0.548/1.0827 0.578(0.069,4.827) -0.678/1.1925 0.507(0.049,5.253)
WEEKDAYWednesday -1.1/1.1549 0.333(0.035,3.201) -0.477/0.9758 0.621(0.092,4.201) -1.794/1.3542 0.166(0.012,2.364)
COUNT 0.014/0.109 1.014(0.819,1.256) 0.042/0.0961 1.043(0.864,1.259) 0.021/0.1019 1.021(0.836,1.247)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
并且您的模型中没有重要的预测变量。相反,使用示例数据的多项式回归显示预测变量显着:
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
epiDisplay::mlogit.display(test, decimal=3, alpha=0.05)
Outcome =prog2; Referent group = academic
general vocation
Coeff./SE RRR(95%CI) Coeff./SE RRR(95%CI)
(Intercept) 2.852/1.1664* - 5.218/1.1636*** -
sesmiddle -0.533/0.4437 0.587(0.246,1.4) 0.291/0.4764 1.338(0.526,3.404)
seshigh -1.163/0.5142* 0.313(0.114,0.856) -0.983/0.5956 0.374(0.116,1.203)
write -0.058/0.0214** 0.944(0.905,0.984) -0.114/0.0222*** 0.893(0.855,0.932)
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1