Hadoop 映射减少 java
Hadoop map reduce java
public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), " *$&#/\t\n\f\"'\,.:;?![](){}<>~-_");
while (itr.hasMoreTokens()) {
String term = itr.nextToken().toLowerCase();
List<Pair<String, Pair<Integer, Integer>>> map = new ArrayList<Pair<String, Pair<Integer, Integer>>>();
/*here i am performing some operations*/
for (Pair<String, Pair<Integer, Integer>> i : map){
String w1 = i.getKey();
Text word = new Text(w1);
Pair<Integer, Integer> newValue = i.getValue();
String merge = String.valueOf(newValue.getKey()) + " " + String.valueOf(newValue.getValue());
Text val = new Text(merge);
/*sending both the arguments as text into my context.*/
context.write(word, val);
}
}
}
}
public static class Reducer1 extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text values, Context context) throws IOException, InterruptedException {
/* here i want to extract the values, i tried using for loop but its saying cannot iterate, its expecting something iterable.*/
for (Text t : values)
{
/*this is not working. I know we can use Iterable<IntWritable> for integers but in my case it is text.
}
//context.write(key, values);
}
}
请查看评论行以更好地理解我的问题。
有什么方法可以 提取 reducer 中的文本值。 For
循环需要一些东西 iterable
。
在你的 reducer 中,你会得到一个 Iterable<Text>
,你必须循环并在 space 上拆分,以便为映射器写入的每个 merge
字符串重新创建字符串值.
注意:如果地图中的 newValue.getKey()
本身包含字符串
,则 space 上的拆分将不会很可靠
public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), " *$&#/\t\n\f\"'\,.:;?![](){}<>~-_");
while (itr.hasMoreTokens()) {
String term = itr.nextToken().toLowerCase();
List<Pair<String, Pair<Integer, Integer>>> map = new ArrayList<Pair<String, Pair<Integer, Integer>>>();
/*here i am performing some operations*/
for (Pair<String, Pair<Integer, Integer>> i : map){
String w1 = i.getKey();
Text word = new Text(w1);
Pair<Integer, Integer> newValue = i.getValue();
String merge = String.valueOf(newValue.getKey()) + " " + String.valueOf(newValue.getValue());
Text val = new Text(merge);
/*sending both the arguments as text into my context.*/
context.write(word, val);
}
}
}
}
public static class Reducer1 extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text values, Context context) throws IOException, InterruptedException {
/* here i want to extract the values, i tried using for loop but its saying cannot iterate, its expecting something iterable.*/
for (Text t : values)
{
/*this is not working. I know we can use Iterable<IntWritable> for integers but in my case it is text.
}
//context.write(key, values);
}
}
请查看评论行以更好地理解我的问题。
有什么方法可以 提取 reducer 中的文本值。 For
循环需要一些东西 iterable
。
在你的 reducer 中,你会得到一个 Iterable<Text>
,你必须循环并在 space 上拆分,以便为映射器写入的每个 merge
字符串重新创建字符串值.
注意:如果地图中的 newValue.getKey()
本身包含字符串