如何对每一行的数据框执行线性回归
How to perform linear regression over a dataframe for each row
我有一个包含数千行的数据框,但为简单起见,我们假设它有 10 行。考虑对几个患者的十种不同蛋白质进行测量,平均值列在以下数据框中。
proteins Year.1 Year.2 Year.4 Year.5
1 p1 1.90 2.30 2.40 2.80
2 p2 0.90 1.20 1.50 1.90
3 p3 2.30 5.20 6.20 8.70
4 p4 2.10 2.20 2.50 2.60
5 p5 1.85 1.92 1.99 2.01
6 p6 1.20 1.45 1.55 1.65
7 p7 3.50 3.60 3.80 4.10
8 p8 4.20 5.60 6.50 7.20
9 p9 3.80 3.90 4.10 4.50
10 p10 23.00 4.20 6.50 8.90
我需要一个 r 代码来 运行 每行的线性回归(例如第 i=1 行:x=(1,2,3,4), y=(year.1[i, ],year.2[i,],year.3[i,],year.4[i,]))
并创建几个列,可以为它们记录截距、斜率、Rsquared。
我是 R 的新手并且做了一些研究但不确定如何编写 lm 函数的公式
fold_model_lm<-function(df) {
lm((x<-c(1,2,3,4))~(y<-c(year.1,year.2,year.3,year.4)), data=df)
}
但是没有用。知道怎么做吗?
已更新以提取 r-squared,并放弃使用 broom::tidy
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
result = map(model, function(x) list(intercept= x$coef[1],
slope = x$coef[2],
rsq = summary(x)$r.squared))) %>%
unnest_wider(result)
输出:
id data model intercept slope rsq
<int> <list> <list> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> 1.65 0.28 0.956
2 2 <tibble [4 x 4]> <lm> 0.550 0.33 0.995
3 3 <tibble [4 x 4]> <lm> 0.550 2.02 0.971
4 4 <tibble [4 x 4]> <lm> 1.9 0.18 0.953
5 5 <tibble [4 x 4]> <lm> 1.81 0.0550 0.953
6 6 <tibble [4 x 4]> <lm> 1.1 0.145 0.940
7 7 <tibble [4 x 4]> <lm> 3.25 0.200 0.952
8 8 <tibble [4 x 4]> <lm> 3.40 0.99 0.975
9 9 <tibble [4 x 4]> <lm> 3.5 0.23 0.92
10 10 <tibble [4 x 4]> <lm> 20.7 -4 0.373
先前的回答
你可以使用 tidyverse 和 broom
library(tidyverse)
library(broom)
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
输出:
id data model term estimate std.error statistic p.value
<int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> (Intercept) 1.65 0.116 14.2 0.00492
2 1 <tibble [4 x 4]> <lm> x 0.28 0.0424 6.60 0.0222
3 2 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.0474 11.6 0.00736
4 2 <tibble [4 x 4]> <lm> x 0.33 0.0173 19.1 0.00274
5 3 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.681 0.808 0.504
6 3 <tibble [4 x 4]> <lm> x 2.02 0.249 8.13 0.0148
7 4 <tibble [4 x 4]> <lm> (Intercept) 1.9 0.0775 24.5 0.00166
8 4 <tibble [4 x 4]> <lm> x 0.18 0.0283 6.36 0.0238
9 5 <tibble [4 x 4]> <lm> (Intercept) 1.81 0.0237 76.1 0.000173
10 5 <tibble [4 x 4]> <lm> x 0.0550 0.00866 6.35 0.0239
11 6 <tibble [4 x 4]> <lm> (Intercept) 1.1 0.0712 15.5 0.00416
12 6 <tibble [4 x 4]> <lm> x 0.145 0.0260 5.58 0.0306
13 7 <tibble [4 x 4]> <lm> (Intercept) 3.25 0.0866 37.5 0.000709
14 7 <tibble [4 x 4]> <lm> x 0.200 0.0316 6.32 0.0241
15 8 <tibble [4 x 4]> <lm> (Intercept) 3.40 0.309 11.0 0.00814
16 8 <tibble [4 x 4]> <lm> x 0.99 0.113 8.78 0.0127
17 9 <tibble [4 x 4]> <lm> (Intercept) 3.5 0.131 26.6 0.00141
18 9 <tibble [4 x 4]> <lm> x 0.23 0.0480 4.80 0.0408
19 10 <tibble [4 x 4]> <lm> (Intercept) 20.7 10.0 2.06 0.176
20 10 <tibble [4 x 4]> <lm> x -4 3.67 -1.09 0.389
输入:
structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6",
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85,
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92,
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99,
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01,
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
我有一个包含数千行的数据框,但为简单起见,我们假设它有 10 行。考虑对几个患者的十种不同蛋白质进行测量,平均值列在以下数据框中。
proteins Year.1 Year.2 Year.4 Year.5
1 p1 1.90 2.30 2.40 2.80
2 p2 0.90 1.20 1.50 1.90
3 p3 2.30 5.20 6.20 8.70
4 p4 2.10 2.20 2.50 2.60
5 p5 1.85 1.92 1.99 2.01
6 p6 1.20 1.45 1.55 1.65
7 p7 3.50 3.60 3.80 4.10
8 p8 4.20 5.60 6.50 7.20
9 p9 3.80 3.90 4.10 4.50
10 p10 23.00 4.20 6.50 8.90
我需要一个 r 代码来 运行 每行的线性回归(例如第 i=1 行:x=(1,2,3,4), y=(year.1[i, ],year.2[i,],year.3[i,],year.4[i,])) 并创建几个列,可以为它们记录截距、斜率、Rsquared。
我是 R 的新手并且做了一些研究但不确定如何编写 lm 函数的公式
fold_model_lm<-function(df) {
lm((x<-c(1,2,3,4))~(y<-c(year.1,year.2,year.3,year.4)), data=df)
}
但是没有用。知道怎么做吗?
已更新以提取 r-squared,并放弃使用 broom::tidy
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
result = map(model, function(x) list(intercept= x$coef[1],
slope = x$coef[2],
rsq = summary(x)$r.squared))) %>%
unnest_wider(result)
输出:
id data model intercept slope rsq
<int> <list> <list> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> 1.65 0.28 0.956
2 2 <tibble [4 x 4]> <lm> 0.550 0.33 0.995
3 3 <tibble [4 x 4]> <lm> 0.550 2.02 0.971
4 4 <tibble [4 x 4]> <lm> 1.9 0.18 0.953
5 5 <tibble [4 x 4]> <lm> 1.81 0.0550 0.953
6 6 <tibble [4 x 4]> <lm> 1.1 0.145 0.940
7 7 <tibble [4 x 4]> <lm> 3.25 0.200 0.952
8 8 <tibble [4 x 4]> <lm> 3.40 0.99 0.975
9 9 <tibble [4 x 4]> <lm> 3.5 0.23 0.92
10 10 <tibble [4 x 4]> <lm> 20.7 -4 0.373
先前的回答
你可以使用 tidyverse 和 broom
library(tidyverse)
library(broom)
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
输出:
id data model term estimate std.error statistic p.value
<int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> (Intercept) 1.65 0.116 14.2 0.00492
2 1 <tibble [4 x 4]> <lm> x 0.28 0.0424 6.60 0.0222
3 2 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.0474 11.6 0.00736
4 2 <tibble [4 x 4]> <lm> x 0.33 0.0173 19.1 0.00274
5 3 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.681 0.808 0.504
6 3 <tibble [4 x 4]> <lm> x 2.02 0.249 8.13 0.0148
7 4 <tibble [4 x 4]> <lm> (Intercept) 1.9 0.0775 24.5 0.00166
8 4 <tibble [4 x 4]> <lm> x 0.18 0.0283 6.36 0.0238
9 5 <tibble [4 x 4]> <lm> (Intercept) 1.81 0.0237 76.1 0.000173
10 5 <tibble [4 x 4]> <lm> x 0.0550 0.00866 6.35 0.0239
11 6 <tibble [4 x 4]> <lm> (Intercept) 1.1 0.0712 15.5 0.00416
12 6 <tibble [4 x 4]> <lm> x 0.145 0.0260 5.58 0.0306
13 7 <tibble [4 x 4]> <lm> (Intercept) 3.25 0.0866 37.5 0.000709
14 7 <tibble [4 x 4]> <lm> x 0.200 0.0316 6.32 0.0241
15 8 <tibble [4 x 4]> <lm> (Intercept) 3.40 0.309 11.0 0.00814
16 8 <tibble [4 x 4]> <lm> x 0.99 0.113 8.78 0.0127
17 9 <tibble [4 x 4]> <lm> (Intercept) 3.5 0.131 26.6 0.00141
18 9 <tibble [4 x 4]> <lm> x 0.23 0.0480 4.80 0.0408
19 10 <tibble [4 x 4]> <lm> (Intercept) 20.7 10.0 2.06 0.176
20 10 <tibble [4 x 4]> <lm> x -4 3.67 -1.09 0.389
输入:
structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6",
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85,
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92,
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99,
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01,
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))