for-loop线性回归生成新的数据框和结果
for-loop linear regression generation new dataframe with the results
我想在 R 上编写一个循环以对我的数据集基因(= 210011 个基因和总共 6 个样本;列是基因,行是样本)执行线性回归,以确定年龄和性别如何影响基因表达。我想将线性回归的拟合值输出保存在一个新的数据框中(生成基本上类似的数据框,其中列上有基因,行中有样本)。
所以我写的循环是:
genelist <- df %>% select(5:21011) #select only genes
for (i in 1:length(genelist)) {
formula <- as.formula(paste0(genelist[i], ' ~ age + sex'))
model <- lm(formula, data = df)
print(model$fitted.values)
}
但我无法保存新的数据框。我试着按照这个
test <- list(); model <- list()
for (i in 1:length(genelist)) {
formula[[i]] = paste0(genelist[i], ' ~ age + sex')
model[[i]] = lm(formula[[i]], data = df)
}
但它给了我“列表 0”作为结果,所以我一定是写错了。如何修改我的原始代码以生成包含结果的新数据框?
感谢帮助帮助!
创建一个列名向量而不是列值向量。尝试以下操作:
names_vec <- names(genelist)
formula <- vector('list', length(names_vec))
model <- vector('list', length(names_vec))
for (i in seq_along(names_vec)) {
formula[[i]] = paste0(genelist[i], ' ~ age + sex')
model[[i]] = lm(formula[[i]], data = df)
}
这是一个有效的例子:
set.seed(2053)
df <- data.frame(age = sample(18:80, 6, replace=FALSE),
sex = sample(0:1, 6, replace=TRUE))
for(i in 1:10){
df[[paste0("gene_", i)]] <- runif(6,0,1)
}
genelist <- df %>% select(3:12) #select only genes
pred <- df %>% select(age, sex)
for (i in 1:length(genelist)) {
formula <- reformulate(c("age", "sex"), response=names(genelist)[i])
model <- lm(formula, data = df)
pred[[names(genelist)[i]]] <- predict(model, newdata=pred)
}
pred
# age sex gene_1 gene_2 gene_3 gene_4 gene_5 gene_6
# 1 54 0 0.6460394 0.7975062 0.542963150 0.5766314 0.43716321 0.3731399
# 2 65 0 0.4969311 0.7557411 0.499976012 0.7201710 -0.02954846 0.3392473
# 3 49 0 0.7138160 0.8164903 0.562502758 0.5113862 0.64930488 0.3885457
# 4 62 0 0.5375970 0.7671316 0.511699777 0.6810238 0.09773654 0.3484907
# 5 44 0 0.7815925 0.8354744 0.582042366 0.4461409 0.86144655 0.4039515
# 6 40 1 0.3976764 0.3673542 0.009429805 0.2500409 0.38185899 0.5017752
# gene_7 gene_8 gene_9 gene_10
# 1 0.6990817 0.6336038 0.36330413 0.3146205
# 2 0.6414371 0.8336259 0.58575121 0.2651734
# 3 0.7252838 0.5426847 0.26219181 0.3370964
# 4 0.6571584 0.7790744 0.52508383 0.2786590
# 5 0.7514859 0.4517656 0.16107950 0.3595723
# 6 0.1903702 0.9501972 0.09472406 0.6118369
我想在 R 上编写一个循环以对我的数据集基因(= 210011 个基因和总共 6 个样本;列是基因,行是样本)执行线性回归,以确定年龄和性别如何影响基因表达。我想将线性回归的拟合值输出保存在一个新的数据框中(生成基本上类似的数据框,其中列上有基因,行中有样本)。
所以我写的循环是:
genelist <- df %>% select(5:21011) #select only genes
for (i in 1:length(genelist)) {
formula <- as.formula(paste0(genelist[i], ' ~ age + sex'))
model <- lm(formula, data = df)
print(model$fitted.values)
}
但我无法保存新的数据框。我试着按照这个
test <- list(); model <- list()
for (i in 1:length(genelist)) {
formula[[i]] = paste0(genelist[i], ' ~ age + sex')
model[[i]] = lm(formula[[i]], data = df)
}
但它给了我“列表 0”作为结果,所以我一定是写错了。如何修改我的原始代码以生成包含结果的新数据框?
感谢帮助帮助!
创建一个列名向量而不是列值向量。尝试以下操作:
names_vec <- names(genelist)
formula <- vector('list', length(names_vec))
model <- vector('list', length(names_vec))
for (i in seq_along(names_vec)) {
formula[[i]] = paste0(genelist[i], ' ~ age + sex')
model[[i]] = lm(formula[[i]], data = df)
}
这是一个有效的例子:
set.seed(2053)
df <- data.frame(age = sample(18:80, 6, replace=FALSE),
sex = sample(0:1, 6, replace=TRUE))
for(i in 1:10){
df[[paste0("gene_", i)]] <- runif(6,0,1)
}
genelist <- df %>% select(3:12) #select only genes
pred <- df %>% select(age, sex)
for (i in 1:length(genelist)) {
formula <- reformulate(c("age", "sex"), response=names(genelist)[i])
model <- lm(formula, data = df)
pred[[names(genelist)[i]]] <- predict(model, newdata=pred)
}
pred
# age sex gene_1 gene_2 gene_3 gene_4 gene_5 gene_6
# 1 54 0 0.6460394 0.7975062 0.542963150 0.5766314 0.43716321 0.3731399
# 2 65 0 0.4969311 0.7557411 0.499976012 0.7201710 -0.02954846 0.3392473
# 3 49 0 0.7138160 0.8164903 0.562502758 0.5113862 0.64930488 0.3885457
# 4 62 0 0.5375970 0.7671316 0.511699777 0.6810238 0.09773654 0.3484907
# 5 44 0 0.7815925 0.8354744 0.582042366 0.4461409 0.86144655 0.4039515
# 6 40 1 0.3976764 0.3673542 0.009429805 0.2500409 0.38185899 0.5017752
# gene_7 gene_8 gene_9 gene_10
# 1 0.6990817 0.6336038 0.36330413 0.3146205
# 2 0.6414371 0.8336259 0.58575121 0.2651734
# 3 0.7252838 0.5426847 0.26219181 0.3370964
# 4 0.6571584 0.7790744 0.52508383 0.2786590
# 5 0.7514859 0.4517656 0.16107950 0.3595723
# 6 0.1903702 0.9501972 0.09472406 0.6118369