使用 sapply 时,我在 str2lang(x) 中遇到错误:<text>:1:31: unexpected symbol 1 ^

When using sapply,I get Error in str2lang(x) : <text>:1:31: unexpected symbol 1 ^

当运行这段代码时,我会报错:

genes<-colnames(survdata)[-c(1:3)]
univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:31: unexpected symbol
1: Surv(OS,status)~ ABC7-42389800N19.1
                                  ^

如果我删除元素并再次 运行 代码,则会再次出现类似的错误:

univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:26: unexpected symbol
1: Surv(OS,status)~ CITF22-1A6.3
                             ^

不知道哪里错了

数据示例:

head(genes,n = 50)
 [1] "A1BG"               "A1BG-AS1"           "A2M"               
 [4] "A2M-AS1"            "A2ML1"              "A2MP1"             
 [7] "A3GALT2"            "A4GALT"             "AAAS"              
[10] "AACS"               "AACSP1"             "AADAT"             
[13] "AAED1"              "AAGAB"              "AAK1"              
[16] "AAMDC"              "AAMP"               "AANAT"             
[19] "AAR2"               "AARD"               "AARS"              
[22] "AARS2"              "AARSD1"             "AASDH"             
[25] "AASDHPPT"           "AASS"               "AATF"              
[28] "AATK"               "AATK-AS1"           "ABAT"              
[31] "ABC7-42389800N19.1" "ABCA1"              "ABCA10"            
[34] "ABCA11P"            "ABCA12"             "ABCA13"            
[37] "ABCA17P"            "ABCA2"              "ABCA3"             
[40] "ABCA4"              "ABCA5"              "ABCA6"             
[43] "ABCA7"              "ABCA8"              "ABCA9"             
[46] "ABCB1"              "ABCB10"             "ABCB4"             
[49] "ABCB6"              "ABCB7"   

      

这是因为基因名称中包含-base::str2lang认为是数学表达式。我们可以按如下方式解决此问题:

  • “清理”基因名称以将 - 转换为 _ 并将其记录在某处。

然后我们有:

genes <- c("ABC7-42389800N19.1", "AATK-AS1")
sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',
+                                          sub("-", "_",x))))
$`ABC7-42389800N19.1`
Surv(OS, status) ~ ABC7_42389800N19.1
<environment: 0x000002ad508b58e8>

$`AATK-AS1`
Surv(OS, status) ~ AATK_AS1
<environment: 0x000002ad508b3c30>

这是为什么会出现这种情况的说明:

A <- 4; B<- 20
str2lang("A-B")
A - B
eval(str2lang("A-B"))
[1] -16

str2lang 本质上类似于可怕的 eval-parse 框架。从文档中,这就是它的作用:

str2expression(s) and str2lang(s) return special versions of parse(text=s, keep.source=FALSE) and can therefore be regarded as transforming character strings s to expressions, calls, etc.

注意

  1. 由于这将用于建模,因此最好在 colnames 阶段执行 sub,以便模型的输入数据具有我们期望的名称:
# not tested but you get the idea
colnames(survdata)[-c(1:3)]<-sub("-", "_",colnames(survdata)[-c(1:3)])
  1. 为了 biological/research 目的,记录为什么按照此答案中的建议清理基因名称很重要。