使用 'transformFunc' 时关于 rxDataStep 的问题

Question

下面的 R 代码是向数据集添加一列，return data.frame。

xdfAirDemo <- RxXdfData(file.path(rxGetOption("sampleDataDir"),  "AirlineDemoSmall.xdf"))

我添加了一个打印函数来检查向量的长度。

f.append <- function(lst){
  lst$mod_val_test <- rep(1, length(lst[[1]]))
  print(length(lst$mod_val_test))
  return(lst)
}

df.Airline <- rxDataStep(inData = xdfAirDemo, transformFunc = f.append)

当我运行上面的rxDatastep时，'f.append'函数中的print函数执行了两次，输出了两个值。有人可以帮助我了解 rxDatastep 的工作原理吗？

结果如下。 [1] 10

[1] 600000

读取的行数：600000，处理的总行数：600000，块总时间：0.651 秒

Answer 1

当您调用 rxDataStep 时，它实际上运行将您的代码放在数据的前 10 行作为测试。如果成功，它将一次处理整个数据集一个块。

如果你不希望你的代码在测试中被执行运行，你可以检查.rxIsTestChunk内置变量的值：

f.append <- function(lst)
{
    # don't print anything if this is the test chunk
    if(.rxIsTestChunk)
        return(NULL)

    lst$mod_val_test <- rep(1, length(lst[[1]]))
    print(length(lst$mod_val_test))
    return(lst)
}

使用 'transformFunc' 时关于 rxDataStep 的问题

Questions on rxDataStep when using 'transformFunc'

r

microsoft-r