如何对每一行的数据框执行线性回归

Question

我有一个包含数千行的数据框，但为简单起见，我们假设它有 10 行。考虑对几个患者的十种不同蛋白质进行测量，平均值列在以下数据框中。

proteins Year.1 Year.2 Year.4 Year.5
1        p1   1.90   2.30   2.40   2.80
2        p2   0.90   1.20   1.50   1.90
3        p3   2.30   5.20   6.20   8.70
4        p4   2.10   2.20   2.50   2.60
5        p5   1.85   1.92   1.99   2.01
6        p6   1.20   1.45   1.55   1.65
7        p7   3.50   3.60   3.80   4.10
8        p8   4.20   5.60   6.50   7.20
9        p9   3.80   3.90   4.10   4.50
10      p10  23.00   4.20   6.50   8.90

我需要一个 r 代码来运行每行的线性回归（例如第 i=1 行：x=(1,2,3,4), y=(year.1[i, ],year.2[i,],year.3[i,],year.4[i,])) 并创建几个列，可以为它们记录截距、斜率、Rsquared。

我是 R 的新手并且做了一些研究但不确定如何编写 lm 函数的公式

fold_model_lm<-function(df) {
        lm((x<-c(1,2,3,4))~(y<-c(year.1,year.2,year.3,year.4)), data=df)
}

但是没有用。知道怎么做吗？

Answer 1

已更新以提取 r-squared，并放弃使用 `broom::tidy`

dat %>% 
  mutate(id=row_number()) %>% 
  pivot_longer(starts_with("Year")) %>% 
  group_by(id) %>% 
  mutate(x=c(1,2,3,4)) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(value ~ x, data = .)),
         result = map(model, function(x) list(intercept= x$coef[1],
                                              slope = x$coef[2],
                                              rsq = summary(x)$r.squared))) %>% 
  unnest_wider(result)

输出：

      id data             model  intercept   slope   rsq
   <int> <list>           <list>     <dbl>   <dbl> <dbl>
 1     1 <tibble [4 x 4]> <lm>       1.65   0.28   0.956
 2     2 <tibble [4 x 4]> <lm>       0.550  0.33   0.995
 3     3 <tibble [4 x 4]> <lm>       0.550  2.02   0.971
 4     4 <tibble [4 x 4]> <lm>       1.9    0.18   0.953
 5     5 <tibble [4 x 4]> <lm>       1.81   0.0550 0.953
 6     6 <tibble [4 x 4]> <lm>       1.1    0.145  0.940
 7     7 <tibble [4 x 4]> <lm>       3.25   0.200  0.952
 8     8 <tibble [4 x 4]> <lm>       3.40   0.99   0.975
 9     9 <tibble [4 x 4]> <lm>       3.5    0.23   0.92 
10    10 <tibble [4 x 4]> <lm>      20.7   -4      0.373

先前的回答

你可以使用 tidyverse 和 broom

library(tidyverse)
library(broom)
dat %>% 
  mutate(id=row_number()) %>% 
  pivot_longer(starts_with("Year")) %>% 
  group_by(id) %>% 
  mutate(x=c(1,2,3,4)) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(value ~ x, data = .)),
         tidied = map(model, tidy)) %>%
  unnest(tidied)

输出：

      id data             model  term        estimate std.error statistic  p.value
   <int> <list>           <list> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
 1     1 <tibble [4 x 4]> <lm>   (Intercept)   1.65     0.116      14.2   0.00492 
 2     1 <tibble [4 x 4]> <lm>   x             0.28     0.0424      6.60  0.0222  
 3     2 <tibble [4 x 4]> <lm>   (Intercept)   0.550    0.0474     11.6   0.00736 
 4     2 <tibble [4 x 4]> <lm>   x             0.33     0.0173     19.1   0.00274 
 5     3 <tibble [4 x 4]> <lm>   (Intercept)   0.550    0.681       0.808 0.504   
 6     3 <tibble [4 x 4]> <lm>   x             2.02     0.249       8.13  0.0148  
 7     4 <tibble [4 x 4]> <lm>   (Intercept)   1.9      0.0775     24.5   0.00166 
 8     4 <tibble [4 x 4]> <lm>   x             0.18     0.0283      6.36  0.0238  
 9     5 <tibble [4 x 4]> <lm>   (Intercept)   1.81     0.0237     76.1   0.000173
10     5 <tibble [4 x 4]> <lm>   x             0.0550   0.00866     6.35  0.0239  
11     6 <tibble [4 x 4]> <lm>   (Intercept)   1.1      0.0712     15.5   0.00416 
12     6 <tibble [4 x 4]> <lm>   x             0.145    0.0260      5.58  0.0306  
13     7 <tibble [4 x 4]> <lm>   (Intercept)   3.25     0.0866     37.5   0.000709
14     7 <tibble [4 x 4]> <lm>   x             0.200    0.0316      6.32  0.0241  
15     8 <tibble [4 x 4]> <lm>   (Intercept)   3.40     0.309      11.0   0.00814 
16     8 <tibble [4 x 4]> <lm>   x             0.99     0.113       8.78  0.0127  
17     9 <tibble [4 x 4]> <lm>   (Intercept)   3.5      0.131      26.6   0.00141 
18     9 <tibble [4 x 4]> <lm>   x             0.23     0.0480      4.80  0.0408  
19    10 <tibble [4 x 4]> <lm>   (Intercept)  20.7     10.0         2.06  0.176   
20    10 <tibble [4 x 4]> <lm>   x            -4        3.67       -1.09  0.389

输入：

structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6", 
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85, 
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92, 
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99, 
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01, 
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

如何对每一行的数据框执行线性回归

How to perform linear regression over a dataframe for each row

regression

r

linear-regression

bigdata

已更新以提取 r-squared，并放弃使用 `broom::tidy`

先前的回答

如何对每一行的数据框执行线性回归

How to perform linear regression over a dataframe for each row

regression

r

linear-regression

bigdata

已更新以提取 r-squared，并放弃使用 broom::tidy

先前的回答

已更新以提取 r-squared，并放弃使用 `broom::tidy`