限制 H2O GLM 中的截距项
Constrain the Intercept term in H2O GLM
我熟悉如何在 h2o.glm()
中 constrain the Betas(回归参数),但很难理解如何扩展它来限制截距。
(我知道 intercept=FALSE
将其约束为零,但我对非零约束感兴趣。)
概念性示例数据集:
n <- 100
set.seed(1)
getPoints <- function(n){
rbind(
data.frame(col= factor('red', levels=c('red','blue')),
x1 = rnorm(n=n,mean=11,sd = 2),
x2 = rnorm(n=n,mean=5,sd=1)),
data.frame(col='blue',
x1 = rnorm(n=n,mean=13,sd = 2),
x2 = rnorm(n=n,mean=7,sd=1))
)
}
df1 <- getPoints(n)
约束示例:
param_names <- c('Intercept', 'x1', 'x2')
param_vals <- c( 27.5, -1.1, -2.7)
beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
lower_bounds = param_vals-0.1,
upper_bounds = param_vals+0.1,
beta_start = param_vals)
如果我 省略 "Intercept" 约束,约束将起作用:
glm1 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df[-1,]
)
glm1@model$coefficients
# Intercept x1 x2
# 27.68408 -1.00000 -2.60000
但是如果我包含一个 "Intercept" 约束,其他约束也会失败。
glm2 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df)
glm2@model$coefficients
# Intercept x1 x2
# 0.67783085 -0.01185921 -0.03083395
限制拦截的正确语法是什么?
所有约束都严格相等时的解决方法
如果偏离 beta_given
,我可以施加严重的 L2 惩罚 rho
,这里似乎支持 Intercept
:
beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
#lower_bounds = param_vals-0.1, #don't bound
#upper_bounds = param_vals+0.1,
#beta_start = param_vals, # use beta_given
beta_given = param_vals, # new
rho = 1e9 ) # new
然后这个有效:
glm2 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df)
glm2@model$coefficients
# Intercept x1 x2
# 27.5 -1.1 -2.7
all.equal(glm2@model$coefficients, param_vals, check.names=FALSE) # TRUE
这仅在您具有所有等式约束(不是不同的上限和下限)时才有效。
不管怎样,我仍然想知道是否有更简单的方法来做到这一点。
尝试将 standardize
参数设置为 False(如以下代码所示),您可以阅读有关 beta_constraints 参数的更多信息 here:
glm1 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = as.h2o(df1),
beta_constraints=beta_const_df,
standardize = F
)
glm1@model$coefficients
> glm1@model$coefficients
#Intercept x1 x2
#27.6 -1.0 -2.6
我熟悉如何在 h2o.glm()
中 constrain the Betas(回归参数),但很难理解如何扩展它来限制截距。
(我知道 intercept=FALSE
将其约束为零,但我对非零约束感兴趣。)
概念性示例数据集:
n <- 100
set.seed(1)
getPoints <- function(n){
rbind(
data.frame(col= factor('red', levels=c('red','blue')),
x1 = rnorm(n=n,mean=11,sd = 2),
x2 = rnorm(n=n,mean=5,sd=1)),
data.frame(col='blue',
x1 = rnorm(n=n,mean=13,sd = 2),
x2 = rnorm(n=n,mean=7,sd=1))
)
}
df1 <- getPoints(n)
约束示例:
param_names <- c('Intercept', 'x1', 'x2')
param_vals <- c( 27.5, -1.1, -2.7)
beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
lower_bounds = param_vals-0.1,
upper_bounds = param_vals+0.1,
beta_start = param_vals)
如果我 省略 "Intercept" 约束,约束将起作用:
glm1 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df[-1,]
)
glm1@model$coefficients
# Intercept x1 x2
# 27.68408 -1.00000 -2.60000
但是如果我包含一个 "Intercept" 约束,其他约束也会失败。
glm2 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df)
glm2@model$coefficients
# Intercept x1 x2
# 0.67783085 -0.01185921 -0.03083395
限制拦截的正确语法是什么?
所有约束都严格相等时的解决方法
如果偏离 beta_given
,我可以施加严重的 L2 惩罚 rho
,这里似乎支持 Intercept
:
beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
#lower_bounds = param_vals-0.1, #don't bound
#upper_bounds = param_vals+0.1,
#beta_start = param_vals, # use beta_given
beta_given = param_vals, # new
rho = 1e9 ) # new
然后这个有效:
glm2 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = 'df1',
beta_constraints=beta_const_df)
glm2@model$coefficients
# Intercept x1 x2
# 27.5 -1.1 -2.7
all.equal(glm2@model$coefficients, param_vals, check.names=FALSE) # TRUE
这仅在您具有所有等式约束(不是不同的上限和下限)时才有效。
不管怎样,我仍然想知道是否有更简单的方法来做到这一点。
尝试将 standardize
参数设置为 False(如以下代码所示),您可以阅读有关 beta_constraints 参数的更多信息 here:
glm1 <- h2o.glm(x=c('x1','x2'),
y='col',
family='binomial',
lambda=0,
alpha=0,
training_frame = as.h2o(df1),
beta_constraints=beta_const_df,
standardize = F
)
glm1@model$coefficients
> glm1@model$coefficients
#Intercept x1 x2
#27.6 -1.0 -2.6