hadoop作业日志中的"vcore-seconds"是什么意思?

what does "vcore-seconds" in hadoop job log mean?

Job Counters
    Launched map tasks=3
    Launched reduce tasks=45
    Data-local map tasks=1
    Rack-local map tasks=2
    Total time spent by all maps in occupied slots (ms)=29338
    Total time spent by all reduces in occupied slots (ms)=200225
    Total time spent by all map tasks (ms)=29338
    Total time spent by all reduce tasks (ms)=200225
    Total vcore-seconds taken by all map tasks=29338
    Total vcore-seconds taken by all reduce tasks=200225
    Total megabyte-seconds taken by all map tasks=30042112
    Total megabyte-seconds taken by all reduce tasks=205030400

"vcore-seconds"是什么意思,"vcore-seconds "和"Total time spent"有什么区别

没有关于此指标的官方文档,但如果我理解得很好 (see here),vcore-seconds 是 Hadoop 为某些任务分配 vcore 的秒数总和。

我认为这个指标显示了您的 MapReduce 应用程序在 Map 和 Reduce 阶段花费了多少时间,而不涉及其他任务(计划任务、洗牌和排序中间键等...)

很明显,如果您查看上面的作业计数器,vcore-seconds 实际上以毫秒为单位;输出明确指出“总花费时间”行的单位是“(ms)”,但在有关 vcore-seconds 和 megabyte-seconds 的行中省略了该明确单位,但很明显该示例中使用了一个 vcore然而映射 vcore-seconds 显示为 29338 而不是 29.338 !

我已经通过测试许多实际工作验证了这个“清晰度缺陷”并且它一直存在。

我翻遍了 cloudera 的文档,震惊地发现他们从未记录过 vcore-seconds 应该被命名为 vcore-ms 的事实!