mapper 和 reducer 函数的输出到底是什么

Question

这是
的后续问题 映射函数

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

Reducer 函数

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

获得的输出是

编辑 1： 但预期输出是

40 102  
40 104  
40 105

我做错了什么？

mapper 和 reducer 函数到底发生了什么？

Answer 1

What exactly is happening

您正在使用逗号分隔的文本行、拆分逗号并过滤掉一些值。 con.write() 如果您所做的只是提取这些值，则每行只应调用一次。

映射器会将您输出的所有“40”键分组，并形成使用该键写入的所有值的列表。这就是减速器正在读取的内容。

您或许应该为您的地图功能试试这个。

// Set the values to write 
saleValue.set(Integer.parseInt(words[0]));
rangeValue.set(words[3]);

// Filter out only the 40s
if(words[3].equals("40")) {
    // Write out "(40, safeValue)" words.length times 
    for(String word: words )
    {
        con.write( rangeValue , saleValue );
    }
}

如果您不希望拆分字符串的长度有重复值，请去掉 for 循环。

你的 reducer 所做的只是打印出它从 mapper 接收到的内容。

Answer 2

映射器输出将是这样的：

<word,count>

Reducer 输出会是这样的：

<unique word, its total count>

例如：一行被读取，其中的所有单词都被计数并放入<key,value>对中：

<40,1>
<140,1>
<50,1>
<40,1> ..

这里 40,50,140, .. 都是键，值是该键在一行中出现的次数。这发生在映射器中。

然后，这些 key,value 对被发送到缩减器，在缩减器中相似的键都被缩减为单个 key 并且与该键关联的所有值相加以赋予该键一个值-值对。因此，reducer 的结果类似于：

<40,10>
<50,5>
...

在你的例子中，reducer 没有做任何事情。映射器找到的唯一 values/words 作为输出给出。

理想情况下，您应该减少并获得类似以下的输出：“40,150”在同一行中被发现 5 次。

Answer 3

在的上下文中 - 在复制条目时，您不需要在映射器或缩减器中使用循环：

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

并且在减速器中，正如@Serhiy 在原始问题中所建议的那样，您只需要一行代码：

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
}

重新评分 "Edit 1" - 我将把它留作微不足道的练习:)

mapper 和 reducer 函数的输出到底是什么

What exactly is output of mapper and reducer function

hadoop

mapreduce

feature-extraction

mapper

hadoop2