如何计算每个自变量在新数据帧上的预测能力

How to calculate the prediction power of each independent variable on a new data frame

我想计算每个独立 variable.I 的预测能力,有一个名为 df 的训练数据框和一个名为 df1 的测试数据框。我写了一个代码,应该附加基于每个 cloumn 的预测结果作为测试数据的一部分 frame.My 代码给出了一个奇怪的结果:它只呈现一个变量的预测结果并且没有它 name.I would like to see所有变量预测及其名称 too.I 都是函数编写的新手,因此欢迎任何帮助。

df <- read.table(text = " target birds    wolfs     
                            32         9         7 
                            56         8         4 
                            11         2         8 
                            22         2         3 
                            33         8         3 
                            54         1         2 
                            34         7         16 
                            66         1         5 
                            74         17        7 
                            52         8         7 
                            45         2         7 
                            65         20        3 
                            99         6         3 
                            88         1         1 
                            77         3         11 
                            55         30         1  ",header = TRUE)

df1 <- read.table(text = " target birds    wolfs     
                            34         9         7 
                            23         8         4 
                            43         2         8 
                            45         2         3 
                            65         8         3 
                            23         1         2 
                            22         7         16 
                            99         1         5 
                            56         17        7 
                            32         8         7 
                            19         2         7 
                            91         20        3 
                            78         6         3 
                            62         1         1 
                            78         3         11 
                            69         30         1  ",header = TRUE)

这是我使用的代码

for(i in names(df))
     { 
             if(is.numeric(df[3,i]))  ##if row 3 is numeric, the entire column is 
                 {       
                         fit_pred <- predict(lm(df[,i] ~ target, data=df), newdata=df1)

                             res <- fit_pred
                         g<-as.data.frame(cbind(df1,res))
                         g
                     }
         }

我得到的输出是:

 userid target birds wolfs   res
10    321      45     8     7  0.0515967
8     608      33     1     5  0.1696638
3     234      23     2     8  0.1696638
7     294      44     7     1  0.0515967
2     444      46     8     4  0.0515967
11    226      90     2     7  0.1696638
9     123      89     9     7  0.0515967
1     222      67     9     7  0.0515967
5     678      43     8     3  0.0515967
15    999      12     3     9  0.1696638
6     987      33     1     2  0.1696638
14    225      18     1     1  0.1696638
16    987      83     1     1  0.1696638
12    556      77     2     3  0.1696638

您不应在此处使用 for 循环。您应该是 xxapply 系列函数之一。这是执行此操作的 R 方式:

fit_pred <- function(x)predict(lm(x ~ target, data=df), newdata=df1)
do.call(cbind,lapply(df,fit_pre))
  1. 我将你的代码封装在一个函数中
  2. 我使用 lapply 遍历所有列
  3. do.callcbind 汇总结果

这是一个使用包 dplyr 和 tidyr 的过程,以便基于 y~x 组合(您指定的因变量 ~ 您指定的自变量)创建模型,然后使用这些模型来预测新数据。

其背后的想法是 y 和 x 变量都可能改变(即使这里你只有 "target" 作为 y)。我正在使用您在开头指定的数据帧 df 和 df1(我不知道为什么 "target" 在您的输出中变成二进制)。

运行 逐步了解其工作原理并对其进行修改以更好地适应您的过程 objective。

library(dplyr)
library(tidyr)

# input what you want as independent variables y and dependent x
ynames = c("target")
xnames = c("birds","wolfs")


###### build models

# create and reshape train y dataframes
dty = df[ynames]
dty = dty %>% gather(yvariable, yvalue)

# create and reshape train x dataframes
dtx = df[xnames]
dtx = dtx %>% gather(xvariable, xvalue)

# build model for each y~x combination
dt_model =
    dty %>% do(data.frame(.,dtx)) %>%         # create combinations of y and x variables
      group_by(yvariable, xvariable) %>%      # for each pair y and x
      do(model = lm(yvalue~xvalue, data=.))   # build the lm y~x

# you've managed to create a model for each combination and it's stored in a dataframe
dt_model

#   yvariable xvariable   model
# 1    target     birds <S3:lm>
# 2    target     wolfs <S3:lm>



####### predict

# create and reshape test y dataframes
dty = df1[ynames]
dty = dty %>% gather(yvariable, yvalue)

# create and reshape test x dataframes
dtx = df1[xnames]
dtx = dtx %>% gather(xvariable, xvalue)


dty %>% do(data.frame(.,dtx)) %>%            # create combinations of y and x variables
  group_by(yvariable, xvariable) %>%         # for each pair y and x
  do(data.frame(., pred = predict(dt_model$model[dt_model$yvariable==.$yvariable &         
                                                 dt_model$xvariable==.$xvariable][[1]]))) %>%     # get the corresponding model and predict new data
  ungroup()

#    yvariable yvalue xvariable xvalue     pred
# 1     target     34     birds      9 54.30627
# 2     target     23     birds      8 53.99573
# 3     target     43     birds      2 52.13249
# 4     target     45     birds      2 52.13249
# 5     target     65     birds      8 53.99573
# 6     target     23     birds      1 51.82195
# 7     target     22     birds      7 53.68519
# 8     target     99     birds      1 51.82195
# 9     target     56     birds     17 56.79059
# 10    target     32     birds      8 53.99573
# 11    target     19     birds      2 52.13249
# 12    target     91     birds     20 57.72220
# 13    target     78     birds      6 53.37465
# 14    target     62     birds      1 51.82195
# 15    target     78     birds      3 52.44303
# 16    target     69     birds     30 60.82760
# 17    target     34     wolfs      7 51.49364
# 18    target     23     wolfs      4 56.38136
# 19    target     43     wolfs      8 49.86441
# 20    target     45     wolfs      3 58.01059
# 21    target     65     wolfs      3 58.01059
# 22    target     23     wolfs      2 59.63983
# 23    target     22     wolfs     16 36.83051
# 24    target     99     wolfs      5 54.75212
# 25    target     56     wolfs      7 51.49364
# 26    target     32     wolfs      7 51.49364
# 27    target     19     wolfs      7 51.49364
# 28    target     91     wolfs      3 58.01059
# 29    target     78     wolfs      3 58.01059
# 30    target     62     wolfs      1 61.26907
# 31    target     78     wolfs     11 44.97669
# 32    target     69     wolfs      1 61.26907