从 Mapper 写入多个输出
Write multiple outputs from Mapper
下面的示例数据 input.txt,它有 2 列键和值。对于Mapper处理的每条记录,map的输出应该写入
1)HDFS => 需要根据键列创建新文件
2)上下文对象
下面是代码,其中需要根据键列创建 4 个文件,但没有创建文件。输出也不正确。我期待 wordcount 输出,但我得到的是字符计数输出。
input.txt
------------
key value
HelloWorld1|ID1
HelloWorld2|ID2
HelloWorld3|ID3
HelloWorld4|ID4
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException {
String line = value.toString();
String[] fileContent = line.split("|");
Path hdfsPath = new Path("/filelocation/" + fileContent[0]);
System.out.println("FilePath : " +hdfsPath);
Configuration configuration = con.getConfiguration();
writeFile(fileContent[1], hdfsPath, configuration);
for (String word : fileContent) {
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
static void writeFile(String fileContent, Path hdfsPath, Configuration configuration) throws IOException {
FileSystem fs = FileSystem.get(configuration);
FSDataOutputStream fin = fs.create(hdfsPath);
fin.writeUTF(fileContent);
fin.close();
}
}
Split 使用正则表达式。您需要像 .split("\|");
这样转义 '|'
在此处查看文档:http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
下面的示例数据 input.txt,它有 2 列键和值。对于Mapper处理的每条记录,map的输出应该写入
1)HDFS => 需要根据键列创建新文件
2)上下文对象
下面是代码,其中需要根据键列创建 4 个文件,但没有创建文件。输出也不正确。我期待 wordcount 输出,但我得到的是字符计数输出。
input.txt
------------
key value
HelloWorld1|ID1
HelloWorld2|ID2
HelloWorld3|ID3
HelloWorld4|ID4
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException {
String line = value.toString();
String[] fileContent = line.split("|");
Path hdfsPath = new Path("/filelocation/" + fileContent[0]);
System.out.println("FilePath : " +hdfsPath);
Configuration configuration = con.getConfiguration();
writeFile(fileContent[1], hdfsPath, configuration);
for (String word : fileContent) {
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
static void writeFile(String fileContent, Path hdfsPath, Configuration configuration) throws IOException {
FileSystem fs = FileSystem.get(configuration);
FSDataOutputStream fin = fs.create(hdfsPath);
fin.writeUTF(fileContent);
fin.close();
}
}
Split 使用正则表达式。您需要像 .split("\|");
'|'
在此处查看文档:http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html