如何从 mapreduce 中的文本文件中用 (|) 拆分字符串？

Question

我正在尝试编写一个 mapreduce 程序，该程序表示要查找每台已售出的电视机的出现情况。 I/P 前- 三星|Optima|14|中央邦|132401|14200 奥尼达|清醒|18|北方邦|232401|16200 Akai|体面|16|喀拉拉邦|922401|12200 熔岩|注意|20|阿萨姆|454601|24200 禅|超级|14|马哈拉施特拉邦|619082|9200

下面是我写的mapreduce代码- 映射器-

public class TotalUnitMapper extends Mapper<LongWritable,Text,Text,IntWritable> {   
Text tvname;
//IntWritable unit; 
public void setup(Context context){     
    tvname = new Text();
    //  unit = new IntWritable();
}   
public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException{       
    String[] lineArray2 = value.toString().split("|");      
    if(!lineArray2[0].contains("NA") || (!lineArray2[1].contains("NA"))){
            tvname.set((lineArray2[0]));
            IntWritable unit = new IntWritable(1);
            context.write(tvname,unit);
        }   
}}

减速器- public class TotalUnitReducer 扩展 Reducer {

public void reduce(Text tvname, Iterable<IntWritable> values, Context context)
            throws IOException,InterruptedException{
    int sum = 0;
    for (IntWritable value : values){
        sum+= value.get();
    }
    context.write(tvname, new IntWritable(sum));
}}

Driver-

public class TotalUnit {

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "Assignment 3.3-2");
    job.setJarByClass(TotalUnit.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);      
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);     
    job.setMapperClass(TotalUnitMapper.class);
    job.setReducerClass(TotalUnitReducer.class);        
    job.setNumReduceTasks(2);        
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job,new Path(args[1]));      
    job.waitForCompletion(true);
}}

但是我得到 O/P 作为这个-

只打印电视名称的首字母，我不知道为什么。斯普利特有什么问题吗？请帮忙，因为我是 Hadoop 的初学者。提前致谢。

Answer 1

转义该参数：

String d = "Samsung|Optima|14|Madhya Pradesh|132401|14200 Onida|Lucid|18|Uttar Pradesh|232401|16200 Akai|Decent|16|Kerala|922401|12200 Lava|Attention|20|Assam|454601|24200 Zen|Super|14|Maharashtra|619082|9200";

String[] lineArray2 = value.toString().split("\|");      
System.out.println(Arrays.toString(lineArray2));

如何从 mapreduce 中的文本文件中用 (|) 拆分字符串？

how to split string with (|) from a text file in mapreduce?

java

hadoop

mapreduce