如何访问定义位置之外的对象中的累加器?

How to access accumulators in object outside the place where they were defined?

我将 helper map 函数定义为 helper 对象中的单独 def,它不会 "see" 前面代码中定义的累加器。 Spark 文档接缝建议将 "remote" 函数保留在对象内,但我如何使它们与这些累加器一起工作?

object mainlogic{
    val counter = sc.accumulator(0)
    val data = sc.textFile(...)// load logic here
    val myrdd = data.mapPartitionsWithIndex(mapFunction)
}

object helper{
  def mapFunction(...)={
      counter+=1 // not compiling
  }
}

像其他任何代码一样,需要将类似的东西作为参数传入:

object mainlogic{
    val counter = sc.accumulator(0)
    val data = sc.textFile(...)// load logic here
    val myrdd = data.mapPartitionsWithIndex(mapFunction(counter, _, _))
}

object helper{
  def mapFunction(counter: Accumulator[Int], ...)={
      counter+=1 // not compiling
  }
}

不过请务必记住文档中的注释:

For accumulator updates performed inside actions only, Spark guarantees that each task’s update to the accumulator will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task’s update may be applied more than once if tasks or job stages are re-executed.