R ANOVA循环中的可变长度错误
Variable length error in R ANOVA loop
我目前正在尝试 运行 我的数据框上的方差分析,其格式如下:
ethnicity sampleID batch gender gene1 gene2 gene3 ...
..最多几千个基因,table 由基因表达值填充。
下面是我用来尝试 运行 每个基因的方差分析以发现种族差异的代码:
# here, 'merge' is the dataframe as described above
# set ethnicity to categorical
merge$ethnicity <- factor(merge$ethnicity, levels=c("Chinese","Malay","Indian"))
# parametric anova for each gene
baseformula <- " ~ ethnicity"
for (i in 5:ncol(merge))
{
p <- anova(lm(colnames(merge)[i] ~ ethnicity, data=merge)) # variable lengths differ??
}
当我尝试 运行 执行此代码时,出现以下错误:
Error in model.frame.default(formula = colnames(merge)[i] ~ ethnicity, : variable lengths differ (found for 'ethnicity')
我已经检查了我的种族列的长度,它与我的 gene1 列的长度相同。我也曾尝试对 merge$ethnicity
使用 na.omit()
命令,但它仍然给出相同的错误。
有人对问题出在哪里有任何建议吗?
谢谢!
编辑:这是我的数据框的前五行:
这是我的数据框的前五行和前五列:
ethnicity sample.id Batch Gender X7896759
1 1 H60903 B6 1 6.19649
2 1 H61603 B2 1 6.74464
3 1 H61608 B7 2 6.20268
4 1 H62204 B4 1 6.71395
5 1 H62901 B7 2 6.59963
使用代码:
for (i in 5:ncol(merge))
{
print(colnames(merge)[i])
print(summary(aov(merge[,i] ~ merge$ethnicity)))
}
似乎给我以下错误:
Error in levels(x)[x] : only 0's may be mixed with negative subscripts In addition: Warning messages: 1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
我生成了一个例子。 df
包含一个变量etnicity
,有3组,有两个基因。 etnicity
是您的预测变量。 loop
打印与 etnicity
关联的每个基因的 aov
摘要结果。
set.seed(1); df <- data.frame(etnicity=c('A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'), gene1=rnorm(9), gene2=rnorm(9))
for(i in 2:ncol(df)){
print(colnames(df)[i])
print( summary( aov(df[,i] ~ df$etnicity) ) )
}
[1] "gene1"
Df Sum Sq Mean Sq F value Pr(>F)
df$etnicity 2 1.324 0.6619 1.006 0.42
Residuals 6 3.947 0.6579
[1] "gene2"
Df Sum Sq Mean Sq F value Pr(>F)
df$etnicity 2 2.436 1.218 0.977 0.429
Residuals 6 7.478 1.246
将其应用于类似于 OP 的数据海。
df <- read.table(text="ethnicity sample.id Batch Gender X7896759
1 1 H60903 B6 1 6.19649
2 1 H61603 B2 1 6.74464
3 2 H61608 B7 2 6.20268
4 2 H62204 B4 1 6.71395
5 3 H62901 B7 2 6.59963", header=T, stringsAsFactors=F)
for(i in 5:ncol(df)){
print(colnames(df)[i])
print(summary(aov(df[,i]~df$ethnicity)))
}
[1] "X7896759"
Df Sum Sq Mean Sq F value Pr(>F)
df$ethnicity 1 0.00803 0.00803 0.084 0.791
Residuals 3 0.28767 0.09589