MapReduce映射

Question

我正在学习 MapReduce 框架并且有以下相同的问题：

MapReduce 范式本质上有 map() 和 reduce()（以及其他一些）。是否可以将所有编程逻辑有效地表示为 map() 或 reduce()？

例如，假设我想对一棵树进行中序遍历。这个任务是否可以有效地只分成map()和reduce()两个任务？如果是，如何？如果不是，那么我如何利用 MapReduce 框架来完成这项任务？

// Iterative solution
public void inOrderIter(TreeNode root) {

    if(root == null)
        return;

    Stack<TreeNode> s = new Stack<TreeNode>();
    TreeNode currentNode=root;

    while(!s.empty() || currentNode!=null){

        if(currentNode!=null)
        {
            s.push(currentNode);
            currentNode=currentNode.left;
        }
        else
        {
            TreeNode n=s.pop();
            System.out.printf("%d ",n.data);
            currentNode=n.right;
        }
    }
}

我们可以只有一个 map() 而没有相应的 reduce() 吗？
根据 this and this，reduce() 函数生成最终输出 - 是否有必要只生成一个单个价值？
你如何决定某个任务应该是 map() 还是 reduce() 的一部分？
关于如何 map-ify 和 reduc-ify 给定任务的任何一般指示？

Answer 1

回答您的问题：

The MapReduce paradigm essentially has map() and reduce() (and a few others as well). Can all the programming logic be effectively represented as either a map() or a reduce()?

MapReduce 是一种设计模式，因此仅适用于适合大数据上下文的那些问题案例。虽然您可以通过涉及一系列 map-reduce 的算法解决问题，但从执行参数（所需的资源和时间）来看，它可能不是最有效的代码。同时，一个传统的算法可能根本行不通（仅仅是因为你的数据量太大）；而 mapreduce 可能会有所帮助。

Can we have only a map() without a corresponding reduce() and vice versa?

在 Java API 中，您可能有 mapreduce 没有减少阶段，但没有 vice-versa。虽然，您可以选择使用默认 IdentityMapper。

As per this and this, the reduce() function generates the final output - is it necessary that it should generate only a single value?

不，您可以通过 context.write() 方法从 mapper/reducer 中写入尽可能多的值，只要您遵守 API.

的输出类型

How do you decide if a certain task should be a part of map() or reduce()?

map reduce 中解决的大部分问题都属于聚合，连接两个 data-set，以及某种汇集数据以推断结果的方法。如果你理解了mapreduce中的概念和处理步骤，你应该能够决定在map()中写什么 and/or reduce().

Any general pointers about how to map-ify and reduc-ify a given task?

同样，这取决于您想要实现的目标。一般来说，map() 是关于读取 data-set、过滤它们（如果可能有不需要的记录或部分记录）并决定哪些所有数据需要根据一个键组合在一起。 Reducer 是关于根据键（由 mapper 编写）处理数据集合。

MapReduce映射

MapReduce mapping

java

mapreduce