Hadoop Mapper参数含义

Question

我是Hadoop新手，对参数有疑问：对于字数统计示例，请参见下面的代码片段：

public static class TokenizerMapper
   extends Mapper<LongWritable, Text, Text, IntWritable> {

   .....

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
   {
       .......
   }
}

我知道"value"参数是从文件中读取的行，但是"key"参数是什么意思？对应什么？

为什么它的类型是 LongWritable？

我通过搜索文档浪费了几个小时，有人可以帮忙吗？

Answer 1

键的类型是LongWritable，因为wordcount程序将输入作为TextInputFormat

根据 JavDoc 对于 TextInputFormat

An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..

根据定义，假设您的文字是

We are fine.
How are you?
All are fine.

那么映射器的输入是

键：1值：We are fine.

键：14值：How are you?（第一行大约有13个字符，包括换行符，所以行位置是14）

键：28值：All are fine.（第二行大约有 13 个字符，包括换行符，所以从文件开始的行位置是 28）

Hadoop Mapper参数含义

Hadoop Mapper parameters meaning

hadoop

mapreduce

mapper