在 data.table 中循环一个向量:过滤并总结迭代

Loop over a vector in data.table: filter and summarize iterating with by

我正在制作一个 for 循环来总结 data.table 版本 1.14.0。

我想遍历一个向量,因为我每次都需要按不同的变量对数据进行分组。为此,我只知道 eval( parse( text = x)) 方式。可能这是个坏主意,但我不知道任何其他解决方案。

这里是一个向量和data.table例子:

catvars = c( "cbmi", "anl_cHBA1C")

a = c( rep( "1",3 ), rep( "2", 3 ), rep( "3", 3 ))
b = rep( c( "2012", "2013","2014" ), each = 1 )
z = rep( c( TRUE, FALSE, TRUE ), each = 1 )
m = rep( c( NA ,"1", "1" ), each = 1)
d = rep( c( "No ttm", "Mono", "Poli" ), each = 1 )
e = rep( c( "BMI > 30", "BMI > 30", "BMI < 25" ), each = 1 )
f = rep( c( ">7", "<=7", "<=7" ), each = 1 )


DT = data.table(id = a,
                year = b,
                dm2 = z,
                exp_th = m,
                exp_th2 = d, 
                cbmi = e,
                anl_cHBA1C = f)

没有循环的代码:

cbmi_ttm = DT [ dm2 == TRUE , total := .N , by = .( year )
               ][ dm2 == TRUE , .( .N, total = max( total ) ),
                  by = .( year, ttm2 =  fcase( exp_th2 != "No ttm" , exp_th2 ,
                                               exp_th2 == "No ttm", "Untreated"),
                         cbmi )
                  ][, `:=` ( per = round(N/total*100, 2 ), total = NULL )]

我在循环中编写代码并迭代 catvars

for ( x in catvars ) { 
  assign (paste0( x, "DT_aut_ttm2"),
          DT [ dm2 == TRUE , total := .N, by = .( year )
               ][ dm2 == TRUE , .( .N, total = max( total )),
                  by = .( year, ttm2 =  fcase( exp_th2 != "No ttm" , exp_th2,
                                               exp_th2 == "No ttm", "Untreated" ),
                    eval( parse( text = x )))
                  ][ , ( x ) := parse
                    ][ , parse := NULL
                      ][ , `:=` (per = round( N/total*100, 2 ), total = NULL)]
          )
}

但出现此错误:

"Error in eval(parse(text = x)) : object 'cbmi' not found"

我的预期输出是:

cbmiDT_aut_ttm2

   year      ttm2 N     cbmi per
1: 2012 Untreated 3 BMI > 30 100
2: 2014      Poli 3 BMI < 25 100

anl_cHBA1CDT_aut_ttm2

   year      ttm2 N anl_cHBA1C per
1: 2012 Untreated 3         >7 100
2: 2014      Poli 3        <=7 100

尝试不同的方法后,我删除了第 i 部分(过滤器)并创建了一个新的 data.table 过滤:

DT_filter = DT [ dm2 == TRUE,]

for (x in catvars) {
  assign (paste0( x, "DT_aut_ttm2_nofilter"),
          DT_filter [, total := .N , by = .( year )
                     ][ , .( .N, total = max( total )),
                        by = .( year, ttm2 =  fcase( exp_th2 != "No ttm" , exp_th2,
                                                     exp_th2 == "No ttm", "Untreated" ),
                                eval( parse( text = x )))
                        ][ , ( x ) := parse
                           ][ , parse := NULL
                              ][ , `:=` (per = round( N/total*100, 2 ), total = NULL )]
          )
}

而且它奏效了。

在 data.table 中对这种行为有任何解释吗?可能我在文档中看得不够多,但我还没有找到关于这个“问题”的任何迹象。是真正的已知问题还是只是我的错?

您知道在没有 eval( parse( text = )) 的情况下获得相同结果的任何干净方法吗?

你有什么替代方法吗?

非常感谢

也许一种可能性是融化你的 catvars,像这样:

DT_long <- melt(DT, id.vars = setdiff(colnames(DT),catvars), measure.vars = catvars, variable.name = "catvar")
DT_long[dm2 == TRUE][
  ,total:=.N, by=.(year, catvar)][
    ,.(.N, total = max(total)), 
    by = .(year, ttm2 = fifelse(exp_th2 != "No ttm", exp_th2,"Untreated"), value, catvar)][
      ,`:=`(per = round(N/total*100, 2), total=NULL)][]

输出:

   year      ttm2    value     catvar N per
1: 2012 Untreated BMI > 30       cbmi 3 100
2: 2014      Poli BMI < 25       cbmi 3 100
3: 2012 Untreated       >7 anl_cHBA1C 3 100
4: 2014      Poli      <=7 anl_cHBA1C 3 100