在 data.table 中循环一个向量:过滤并总结迭代
Loop over a vector in data.table: filter and summarize iterating with by
我正在制作一个 for 循环来总结 data.table
版本 1.14.0。
我想遍历一个向量,因为我每次都需要按不同的变量对数据进行分组。为此,我只知道 eval( parse( text = x))
方式。可能这是个坏主意,但我不知道任何其他解决方案。
这里是一个向量和data.table例子:
catvars = c( "cbmi", "anl_cHBA1C")
a = c( rep( "1",3 ), rep( "2", 3 ), rep( "3", 3 ))
b = rep( c( "2012", "2013","2014" ), each = 1 )
z = rep( c( TRUE, FALSE, TRUE ), each = 1 )
m = rep( c( NA ,"1", "1" ), each = 1)
d = rep( c( "No ttm", "Mono", "Poli" ), each = 1 )
e = rep( c( "BMI > 30", "BMI > 30", "BMI < 25" ), each = 1 )
f = rep( c( ">7", "<=7", "<=7" ), each = 1 )
DT = data.table(id = a,
year = b,
dm2 = z,
exp_th = m,
exp_th2 = d,
cbmi = e,
anl_cHBA1C = f)
没有循环的代码:
cbmi_ttm = DT [ dm2 == TRUE , total := .N , by = .( year )
][ dm2 == TRUE , .( .N, total = max( total ) ),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2 ,
exp_th2 == "No ttm", "Untreated"),
cbmi )
][, `:=` ( per = round(N/total*100, 2 ), total = NULL )]
我在循环中编写代码并迭代 catvars
for ( x in catvars ) {
assign (paste0( x, "DT_aut_ttm2"),
DT [ dm2 == TRUE , total := .N, by = .( year )
][ dm2 == TRUE , .( .N, total = max( total )),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2,
exp_th2 == "No ttm", "Untreated" ),
eval( parse( text = x )))
][ , ( x ) := parse
][ , parse := NULL
][ , `:=` (per = round( N/total*100, 2 ), total = NULL)]
)
}
但出现此错误:
"Error in eval(parse(text = x)) : object 'cbmi' not found"
我的预期输出是:
cbmiDT_aut_ttm2
year ttm2 N cbmi per
1: 2012 Untreated 3 BMI > 30 100
2: 2014 Poli 3 BMI < 25 100
和
anl_cHBA1CDT_aut_ttm2
year ttm2 N anl_cHBA1C per
1: 2012 Untreated 3 >7 100
2: 2014 Poli 3 <=7 100
尝试不同的方法后,我删除了第 i 部分(过滤器)并创建了一个新的 data.table 过滤:
DT_filter = DT [ dm2 == TRUE,]
for (x in catvars) {
assign (paste0( x, "DT_aut_ttm2_nofilter"),
DT_filter [, total := .N , by = .( year )
][ , .( .N, total = max( total )),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2,
exp_th2 == "No ttm", "Untreated" ),
eval( parse( text = x )))
][ , ( x ) := parse
][ , parse := NULL
][ , `:=` (per = round( N/total*100, 2 ), total = NULL )]
)
}
而且它奏效了。
在 data.table 中对这种行为有任何解释吗?可能我在文档中看得不够多,但我还没有找到关于这个“问题”的任何迹象。是真正的已知问题还是只是我的错?
您知道在没有 eval( parse( text = ))
的情况下获得相同结果的任何干净方法吗?
你有什么替代方法吗?
非常感谢
也许一种可能性是融化你的 catvars,像这样:
DT_long <- melt(DT, id.vars = setdiff(colnames(DT),catvars), measure.vars = catvars, variable.name = "catvar")
DT_long[dm2 == TRUE][
,total:=.N, by=.(year, catvar)][
,.(.N, total = max(total)),
by = .(year, ttm2 = fifelse(exp_th2 != "No ttm", exp_th2,"Untreated"), value, catvar)][
,`:=`(per = round(N/total*100, 2), total=NULL)][]
输出:
year ttm2 value catvar N per
1: 2012 Untreated BMI > 30 cbmi 3 100
2: 2014 Poli BMI < 25 cbmi 3 100
3: 2012 Untreated >7 anl_cHBA1C 3 100
4: 2014 Poli <=7 anl_cHBA1C 3 100
我正在制作一个 for 循环来总结 data.table
版本 1.14.0。
我想遍历一个向量,因为我每次都需要按不同的变量对数据进行分组。为此,我只知道 eval( parse( text = x))
方式。可能这是个坏主意,但我不知道任何其他解决方案。
这里是一个向量和data.table例子:
catvars = c( "cbmi", "anl_cHBA1C")
a = c( rep( "1",3 ), rep( "2", 3 ), rep( "3", 3 ))
b = rep( c( "2012", "2013","2014" ), each = 1 )
z = rep( c( TRUE, FALSE, TRUE ), each = 1 )
m = rep( c( NA ,"1", "1" ), each = 1)
d = rep( c( "No ttm", "Mono", "Poli" ), each = 1 )
e = rep( c( "BMI > 30", "BMI > 30", "BMI < 25" ), each = 1 )
f = rep( c( ">7", "<=7", "<=7" ), each = 1 )
DT = data.table(id = a,
year = b,
dm2 = z,
exp_th = m,
exp_th2 = d,
cbmi = e,
anl_cHBA1C = f)
没有循环的代码:
cbmi_ttm = DT [ dm2 == TRUE , total := .N , by = .( year )
][ dm2 == TRUE , .( .N, total = max( total ) ),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2 ,
exp_th2 == "No ttm", "Untreated"),
cbmi )
][, `:=` ( per = round(N/total*100, 2 ), total = NULL )]
我在循环中编写代码并迭代 catvars
for ( x in catvars ) {
assign (paste0( x, "DT_aut_ttm2"),
DT [ dm2 == TRUE , total := .N, by = .( year )
][ dm2 == TRUE , .( .N, total = max( total )),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2,
exp_th2 == "No ttm", "Untreated" ),
eval( parse( text = x )))
][ , ( x ) := parse
][ , parse := NULL
][ , `:=` (per = round( N/total*100, 2 ), total = NULL)]
)
}
但出现此错误:
"Error in eval(parse(text = x)) : object 'cbmi' not found"
我的预期输出是:
cbmiDT_aut_ttm2
year ttm2 N cbmi per
1: 2012 Untreated 3 BMI > 30 100
2: 2014 Poli 3 BMI < 25 100
和
anl_cHBA1CDT_aut_ttm2
year ttm2 N anl_cHBA1C per
1: 2012 Untreated 3 >7 100
2: 2014 Poli 3 <=7 100
尝试不同的方法后,我删除了第 i 部分(过滤器)并创建了一个新的 data.table 过滤:
DT_filter = DT [ dm2 == TRUE,]
for (x in catvars) {
assign (paste0( x, "DT_aut_ttm2_nofilter"),
DT_filter [, total := .N , by = .( year )
][ , .( .N, total = max( total )),
by = .( year, ttm2 = fcase( exp_th2 != "No ttm" , exp_th2,
exp_th2 == "No ttm", "Untreated" ),
eval( parse( text = x )))
][ , ( x ) := parse
][ , parse := NULL
][ , `:=` (per = round( N/total*100, 2 ), total = NULL )]
)
}
而且它奏效了。
在 data.table 中对这种行为有任何解释吗?可能我在文档中看得不够多,但我还没有找到关于这个“问题”的任何迹象。是真正的已知问题还是只是我的错?
您知道在没有 eval( parse( text = ))
的情况下获得相同结果的任何干净方法吗?
你有什么替代方法吗?
非常感谢
也许一种可能性是融化你的 catvars,像这样:
DT_long <- melt(DT, id.vars = setdiff(colnames(DT),catvars), measure.vars = catvars, variable.name = "catvar")
DT_long[dm2 == TRUE][
,total:=.N, by=.(year, catvar)][
,.(.N, total = max(total)),
by = .(year, ttm2 = fifelse(exp_th2 != "No ttm", exp_th2,"Untreated"), value, catvar)][
,`:=`(per = round(N/total*100, 2), total=NULL)][]
输出:
year ttm2 value catvar N per
1: 2012 Untreated BMI > 30 cbmi 3 100
2: 2014 Poli BMI < 25 cbmi 3 100
3: 2012 Untreated >7 anl_cHBA1C 3 100
4: 2014 Poli <=7 anl_cHBA1C 3 100