如何从 mapreduce 中的文本文件中用 (|) 拆分字符串?
how to split string with (|) from a text file in mapreduce?
我正在尝试编写一个 mapreduce 程序,该程序表示要查找每台已售出的电视机的出现情况。
I/P 前-
三星|Optima|14|中央邦|132401|14200
奥尼达|清醒|18|北方邦|232401|16200
Akai|体面|16|喀拉拉邦|922401|12200
熔岩|注意|20|阿萨姆|454601|24200
禅|超级|14|马哈拉施特拉邦|619082|9200
下面是我写的mapreduce代码-
映射器-
public class TotalUnitMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
Text tvname;
//IntWritable unit;
public void setup(Context context){
tvname = new Text();
// unit = new IntWritable();
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
String[] lineArray2 = value.toString().split("|");
if(!lineArray2[0].contains("NA") || (!lineArray2[1].contains("NA"))){
tvname.set((lineArray2[0]));
IntWritable unit = new IntWritable(1);
context.write(tvname,unit);
}
}}
减速器-
public class TotalUnitReducer 扩展 Reducer {
public void reduce(Text tvname, Iterable<IntWritable> values, Context context)
throws IOException,InterruptedException{
int sum = 0;
for (IntWritable value : values){
sum+= value.get();
}
context.write(tvname, new IntWritable(sum));
}}
Driver-
public class TotalUnit {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Assignment 3.3-2");
job.setJarByClass(TotalUnit.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(TotalUnitMapper.class);
job.setReducerClass(TotalUnitReducer.class);
job.setNumReduceTasks(2);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.waitForCompletion(true);
}}
但是我得到 O/P 作为这个-
A 1
O 4
S 7
L 3
N 1
Z 2
只打印电视名称的首字母,我不知道为什么。斯普利特有什么问题吗?
请帮忙,因为我是 Hadoop 的初学者。
提前致谢。
转义该参数:
String d = "Samsung|Optima|14|Madhya Pradesh|132401|14200 Onida|Lucid|18|Uttar Pradesh|232401|16200 Akai|Decent|16|Kerala|922401|12200 Lava|Attention|20|Assam|454601|24200 Zen|Super|14|Maharashtra|619082|9200";
String[] lineArray2 = value.toString().split("\|");
System.out.println(Arrays.toString(lineArray2));
我正在尝试编写一个 mapreduce 程序,该程序表示要查找每台已售出的电视机的出现情况。 I/P 前- 三星|Optima|14|中央邦|132401|14200 奥尼达|清醒|18|北方邦|232401|16200 Akai|体面|16|喀拉拉邦|922401|12200 熔岩|注意|20|阿萨姆|454601|24200 禅|超级|14|马哈拉施特拉邦|619082|9200
下面是我写的mapreduce代码- 映射器-
public class TotalUnitMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
Text tvname;
//IntWritable unit;
public void setup(Context context){
tvname = new Text();
// unit = new IntWritable();
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
String[] lineArray2 = value.toString().split("|");
if(!lineArray2[0].contains("NA") || (!lineArray2[1].contains("NA"))){
tvname.set((lineArray2[0]));
IntWritable unit = new IntWritable(1);
context.write(tvname,unit);
}
}}
减速器- public class TotalUnitReducer 扩展 Reducer {
public void reduce(Text tvname, Iterable<IntWritable> values, Context context)
throws IOException,InterruptedException{
int sum = 0;
for (IntWritable value : values){
sum+= value.get();
}
context.write(tvname, new IntWritable(sum));
}}
Driver-
public class TotalUnit {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Assignment 3.3-2");
job.setJarByClass(TotalUnit.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(TotalUnitMapper.class);
job.setReducerClass(TotalUnitReducer.class);
job.setNumReduceTasks(2);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.waitForCompletion(true);
}}
但是我得到 O/P 作为这个-
A 1
O 4
S 7
L 3
N 1
Z 2
只打印电视名称的首字母,我不知道为什么。斯普利特有什么问题吗? 请帮忙,因为我是 Hadoop 的初学者。 提前致谢。
转义该参数:
String d = "Samsung|Optima|14|Madhya Pradesh|132401|14200 Onida|Lucid|18|Uttar Pradesh|232401|16200 Akai|Decent|16|Kerala|922401|12200 Lava|Attention|20|Assam|454601|24200 Zen|Super|14|Maharashtra|619082|9200";
String[] lineArray2 = value.toString().split("\|");
System.out.println(Arrays.toString(lineArray2));