在 Apache 紧缩中为特定的 Dofn 配置减速器的数量
Configuring number of reducers for a particular Dofn in Apache crunch
我知道有像 CRUNCH_BYTES_PER_REDUCE_TASK 或 mapred.reduce.tasks 这样的属性来设置减速器的数量。
任何人都可以建议为需要更多时间执行的特定 Dofn 配置/覆盖默认减速器。
可以为特定的 DoFn 配置减速器,方法是使用 ParallelDoOptions
并将其作为第 4 个参数传递给 parallelDo
,如下所示:
ParallelDoOptions opts = ParallelDoOptions.builder().conf("mapred.reduce.tasks", "64").build();
并将其作为第 4 个参数传递给 parallelDo
。
Crunch 的 MapFn
接口包括 scaleFactor
方法:
You can override the scaleFactor method in your custom DoFns in order to provide a hint to the Crunch planner about how much larger (or smaller) an input data set will become after passing through the process method. If the groupByKey method is called without an explicit number of reducers provided, the planner will try to guess how many reduce tasks should be used for the job based on the size of the input data, which is determined in part by using the result of calling the scaleFactor method on the DoFns in the processing path.
来源:http://crunch.apache.org/user-guide.html#doplan
Javadocs link:http://crunch.apache.org/apidocs/0.15.0/org/apache/crunch/DoFn.html#scaleFactor--
我知道有像 CRUNCH_BYTES_PER_REDUCE_TASK 或 mapred.reduce.tasks 这样的属性来设置减速器的数量。
任何人都可以建议为需要更多时间执行的特定 Dofn 配置/覆盖默认减速器。
可以为特定的 DoFn 配置减速器,方法是使用 ParallelDoOptions
并将其作为第 4 个参数传递给 parallelDo
,如下所示:
ParallelDoOptions opts = ParallelDoOptions.builder().conf("mapred.reduce.tasks", "64").build();
并将其作为第 4 个参数传递给 parallelDo
。
Crunch 的 MapFn
接口包括 scaleFactor
方法:
You can override the scaleFactor method in your custom DoFns in order to provide a hint to the Crunch planner about how much larger (or smaller) an input data set will become after passing through the process method. If the groupByKey method is called without an explicit number of reducers provided, the planner will try to guess how many reduce tasks should be used for the job based on the size of the input data, which is determined in part by using the result of calling the scaleFactor method on the DoFns in the processing path.
来源:http://crunch.apache.org/user-guide.html#doplan
Javadocs link:http://crunch.apache.org/apidocs/0.15.0/org/apache/crunch/DoFn.html#scaleFactor--