R:在模型中跨变量应用模型系数,更好的方法?
R: apply model coefficients across variables in model, a better way?
我编写了一个函数来获取(线性)模型的系数并将它们应用于原始变量以给出一个数据框的项,当它们加在一起时,将等同于预测的结果( ).这种能力对我来说似乎很有用,以便更好地理解每个变量(或更复杂的交互项等)对模型的影响。
有没有更好的方法?我觉得自己像个黑客。我研究了模型的 str() ,目前还没有看到更简单的解决方案。棘手的部分是捕捉和应用交互项。
library(plyr)
nospredict = function(model, data = model$model, sorted = TRUE) { # model is model (from lm, glm...), data is data.frame to be applied to
c = coef(model) # model must support coef()
my.names = names(c) = gsub(':', '*', names(c) ) # ':' equals multiplication in formulas, coefs
data = data[ , colnames(data) %in% my.names] # don't do the attach() below with a zillion variables...
final.out = adply(data, 1, function(y) { # did I mention I like plyr?
attach(as.list(y), warn.conflicts = FALSE) # so you can do eval algebra blackRmagic
out = ldply(my.names, function (x) { # did I mention...
Intercept = 1 # (Intercept) from model is a constant, multiply it by 1
eval( parse( text = paste( c[x], "*", x ) ) ) }) # blackRmagic
out = as.data.frame(t(out)) ; colnames(out) = my.names ; out
})
rownames(final.out) = rownames(data)
final.out$Predict = predict(model, data) ## add predict() as column
if ( sorted ) {
final.out[order(final.out$Predict), ] ## return df sorted by predict()
}
final.out
}
set.seed(10538)
df = data.frame(a = 1:10, b = rnorm(10), c = 1:10 + rnorm(10) )
lmf = lm( c ~ a * b, data = df)
> df
a b c
1 1 -0.41184664 1.3739709
2 2 1.06464586 0.8975101
3 3 -0.07522363 3.4910425
4 4 1.21643049 2.8856876
5 5 0.34061917 4.3851439
6 6 -1.00020786 6.1836535
7 7 -0.36954963 6.4734150
8 8 -1.47754640 6.8150569
9 9 -0.19312147 9.6432687
10 10 2.32220098 9.4276813
> nospredict(lmf)
(Intercept) a b a*b Predict
1 0.09801818 0.9282185 0.48332671 -0.05438652 1.4551769
2 0.09801818 1.8564370 -1.24942570 0.28118420 0.9862137
3 0.09801818 2.7846555 0.08827944 -0.02980103 2.9411521
4 0.09801818 3.7128740 -1.42755405 0.64254425 3.0258824
5 0.09801818 4.6410925 -0.39973700 0.22490279 4.5642765
6 0.09801818 5.5693110 1.17380385 -0.79249635 6.0486367
7 0.09801818 6.4975295 0.43368863 -0.34160685 6.6876294
8 0.09801818 7.4257480 1.73398922 -1.56094237 7.6968130
9 0.09801818 8.3539665 0.22663962 -0.22952439 8.4490999
10 0.09801818 9.2821850 -2.72524198 3.06658890 9.7215501
junk <- data.frame( x1 = rnorm(100), x2 = rnorm(100))
junk$YY <- 2 * junk$x1 + 4 * junk$x2 + 6 * junk$x1 * junk$x2 + 7 + rnorm(100)
out <- lm(YY ~ x1 * x2, data = junk, x = TRUE) # The x = TRUE part is key!
head(out$x)
(Intercept) x1 x2 x1:x2
1 1 -0.34885894 -0.8127228 0.283525629
2 1 -0.04482498 -0.1601600 0.007179167
3 1 -1.11721391 0.3266213 -0.364905892
4 1 -0.08530188 0.3482372 -0.029705287
5 1 0.19138684 -0.1659683 -0.031764149
6 1 -1.89493717 1.0261454 -1.944481020
coef(out)
(Intercept) x1 x2 x1:x2
7.053434 1.804441 4.130249 5.970553
nomThings <- t( t(out$x[, names(coef(out))]) * coef(out) )
如果您将某些因素作为自变量,我不完全确定它是否会正常工作,或者它是否会在所有情况下正常工作。但我怀疑是这样。
当然,您可以将 coef(out) 保存为临时对象,等等,以使代码更具可读性和效率。
鉴于你很容易做到这一点,我会认真考虑你是否应该这样做。
我编写了一个函数来获取(线性)模型的系数并将它们应用于原始变量以给出一个数据框的项,当它们加在一起时,将等同于预测的结果( ).这种能力对我来说似乎很有用,以便更好地理解每个变量(或更复杂的交互项等)对模型的影响。
有没有更好的方法?我觉得自己像个黑客。我研究了模型的 str() ,目前还没有看到更简单的解决方案。棘手的部分是捕捉和应用交互项。
library(plyr)
nospredict = function(model, data = model$model, sorted = TRUE) { # model is model (from lm, glm...), data is data.frame to be applied to
c = coef(model) # model must support coef()
my.names = names(c) = gsub(':', '*', names(c) ) # ':' equals multiplication in formulas, coefs
data = data[ , colnames(data) %in% my.names] # don't do the attach() below with a zillion variables...
final.out = adply(data, 1, function(y) { # did I mention I like plyr?
attach(as.list(y), warn.conflicts = FALSE) # so you can do eval algebra blackRmagic
out = ldply(my.names, function (x) { # did I mention...
Intercept = 1 # (Intercept) from model is a constant, multiply it by 1
eval( parse( text = paste( c[x], "*", x ) ) ) }) # blackRmagic
out = as.data.frame(t(out)) ; colnames(out) = my.names ; out
})
rownames(final.out) = rownames(data)
final.out$Predict = predict(model, data) ## add predict() as column
if ( sorted ) {
final.out[order(final.out$Predict), ] ## return df sorted by predict()
}
final.out
}
set.seed(10538)
df = data.frame(a = 1:10, b = rnorm(10), c = 1:10 + rnorm(10) )
lmf = lm( c ~ a * b, data = df)
> df
a b c
1 1 -0.41184664 1.3739709
2 2 1.06464586 0.8975101
3 3 -0.07522363 3.4910425
4 4 1.21643049 2.8856876
5 5 0.34061917 4.3851439
6 6 -1.00020786 6.1836535
7 7 -0.36954963 6.4734150
8 8 -1.47754640 6.8150569
9 9 -0.19312147 9.6432687
10 10 2.32220098 9.4276813
> nospredict(lmf)
(Intercept) a b a*b Predict
1 0.09801818 0.9282185 0.48332671 -0.05438652 1.4551769
2 0.09801818 1.8564370 -1.24942570 0.28118420 0.9862137
3 0.09801818 2.7846555 0.08827944 -0.02980103 2.9411521
4 0.09801818 3.7128740 -1.42755405 0.64254425 3.0258824
5 0.09801818 4.6410925 -0.39973700 0.22490279 4.5642765
6 0.09801818 5.5693110 1.17380385 -0.79249635 6.0486367
7 0.09801818 6.4975295 0.43368863 -0.34160685 6.6876294
8 0.09801818 7.4257480 1.73398922 -1.56094237 7.6968130
9 0.09801818 8.3539665 0.22663962 -0.22952439 8.4490999
10 0.09801818 9.2821850 -2.72524198 3.06658890 9.7215501
junk <- data.frame( x1 = rnorm(100), x2 = rnorm(100))
junk$YY <- 2 * junk$x1 + 4 * junk$x2 + 6 * junk$x1 * junk$x2 + 7 + rnorm(100)
out <- lm(YY ~ x1 * x2, data = junk, x = TRUE) # The x = TRUE part is key!
head(out$x)
(Intercept) x1 x2 x1:x2
1 1 -0.34885894 -0.8127228 0.283525629
2 1 -0.04482498 -0.1601600 0.007179167
3 1 -1.11721391 0.3266213 -0.364905892
4 1 -0.08530188 0.3482372 -0.029705287
5 1 0.19138684 -0.1659683 -0.031764149
6 1 -1.89493717 1.0261454 -1.944481020
coef(out)
(Intercept) x1 x2 x1:x2
7.053434 1.804441 4.130249 5.970553
nomThings <- t( t(out$x[, names(coef(out))]) * coef(out) )
如果您将某些因素作为自变量,我不完全确定它是否会正常工作,或者它是否会在所有情况下正常工作。但我怀疑是这样。
当然,您可以将 coef(out) 保存为临时对象,等等,以使代码更具可读性和效率。
鉴于你很容易做到这一点,我会认真考虑你是否应该这样做。