如何将数据框的特定值添加到基于另一个数据框的线性回归

How to add specific value of a data frame to a linear regression based on another data frame

我试图从一个数据帧中提取特定值(在我的示例中为 df,特定值是第一列中的 "red" )并将其用作线性回归中的独立变量,该线性回归基于另一个将此值作为 column.I 的数据框将此值另存为字符,但出现错误(描述如下)。如何将此值添加到基于另一个数据帧的 lm 函数的 ba 中?

df <- read.table(text = " color birds    wolfs     
                  red           9         7 
                  red           8         4 
                  red           2         8 
                  red           2         3 
                  black         8         3 
                  black         1         2 
                  black         7         16 
                  black         1         5 
                  black         17        7 
                  black         8         7 
                  black         2         7 
                  green         20        3 
                  green         6         3 
                  green         1         1 
                  green         3         11 
                  green         30         1  ",header = TRUE)

df1 <- read.table(text = " red birds    wolfs     
                   10         9         7 
                   8          8         4 
                   11         2         8 
                   8          2         3 
                   3          8         3 
                   4          1         2 
                   8          7         16 
                   9          1         5 
                   10         17        7 
                   8          8         7 
                   6          2         7     ",header = TRUE)
# I extracted the desired value than I added it to the new lm function and got an error:
 df[1,1]
[1] red
Levels: black green red
lm<-lm(birds~df[1,1],data=df1)
Error in model.frame.default(formula = birds ~ df[1, 1], data = df1, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'df[1, 1]')
# I also tried to change the value into character :
b<-as.character(df[1,1])
b
[1] "red"
lm<-lm(birds~ b ,data=df1)
but got the same error:Error in model.frame.default(formula = birds ~ b, data = df1, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'b')

我想你可以使用

onValue<-as.character(df[1,1]) # "red"
reg<-lm(birds~eval(as.symbol(onValue)),data=df1) # regression 

此外,不要将回归分配给名为 lm 的对象,因为它是函数,可能会造成混淆。

eval(as.symbol(onValue)) 告诉 R 运行 df1 列上的回归,其名称为 onValue(在本例中,"red")

如果你想要一个不同的方法,我发现 update 非常适合这样的任务:

#create a formula outside of lm. This can be a simple one against
#the intercept or one that you already use
form <- birds ~ 1

#then add the new variable using paste + update 
#the . ~ . says include everything before and after the tilde ~
#that existed in original formula  
form <- update(form, paste('. ~ . + ', df[1,1]))
#> form
#birds ~ red

lm <- lm(form, data=df1)

Call:
lm(formula = form, data = df1)

Coefficients:
(Intercept)          red  
      2.339        0.462