在 Hadoop 中获取百分比

Question

我有一个项目，我需要获取一个包含多列的逗号分隔文件并提取公司名称、客户交互的结果以及发生的次数。
然后我需要计算不良交互与良好交互的百分比我正在使用 Hadoop 和 Java.
我有一个可用的 Map 和 Reduce，它为我提供了公司名称以及有多少好的和坏的交互。

我的问题是，我找不到让 Hadoop 划分好坏给我一个百分比的方法。
大多数公司没有任何不良互动。

这是我的地图

public class TermProjectMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable( 1); 
    private Text word = new Text();

      @Override
      public void map(LongWritable key, Text value, Context context)
          throws IOException, InterruptedException {

            String[] columb = value.toString().split(",");
            String companyName = columb[5];
            String companyResponseToConsumer = columb[12];
            String lookfor = "closed without relief";

                if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {companyResponseToConsumer="Bad";}
                else {companyResponseToConsumer="Good";}
                //System.out.println(companyResponseToConsumer);
                if (companyName != "" && companyResponseToConsumer != "")
                {
                    word.set (companyName + " " + companyResponseToConsumer);
                    context.write( word, one); 
                }
      }
      }

这是我的 Reduce

public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable > 
{ 
    private IntWritable result = new IntWritable(); 
      @Override
        public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException 
        { 
            int sum = 0; 
            for (IntWritable val : values) 
            { 
                sum += val.get(); 
            } 
            if (sum > 0) 
            {
                result.set( sum); 
                context.write( key, result);
            }
        } 
    }

这是我现在得到的例子。

AMERICAN EAGLE MORTGAGE COMPANY,Good,   4
AMERICAN EQUITY MORTGAGE,Good,  26 
AMERICAN EXPRESS COMPANY,Bad,   250 
AMERICAN EXPRESS COMPANY,Good,  9094 
AMERICAN FEDERAL MORTGAGE CORPORATION,Bad,  1 
AMERICAN FEDERAL MORTGAGE CORPORATION,Good, 3 
AMERICAN FINANCE HOUSE LARIBA,Good, 3 
AMERICAN FINANCIAL MORTGAGE COMPANY,Good,   3

Answer 1

为了聚合这些公司，您只需要将它们作为键输出，以便它们在缩减程序中组合。换句话说，您希望在同一个键上同时拥有好值和坏值，而不是像现在这样分开。

我最初认为你可以做 [1, 0] 或 [0, 1]，但只输出 1 或 -1 而不是 ("GOOD", 1) 和 ("BAD", 1)会更容易对付。（以及更高效的 Hadoop 数据传输）

所以，例如，

private final static IntWritable ONE = new IntWritable(1); 
private final static IntWritable NEG_ONE = new IntWritable(-1); 

...

    IntWritable status;
    if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {status=NEG_ONE;}
    else {status=ONE;}

    if (!companyName.isEmpty())
    {
        word.set (companyName);
        context.write(companyName, status); 
    }

现在，在减速器中，计算值并计算百分比。

public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable > 
{ 
  private IntWritable result = new IntWritable(); 

  @Override
    public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException 
    { 
        int total = 0; 
        int good_sum = 0;
        for (IntWritable val : values) 
        { 
            good_sum += (val.get() == 1 ? 1 : 0); 
            total += 1
        } 
        if (total > 0) // Prevent division by zero
        {
            double percent = 1.0*good_sum/total;
            // Round it to how every many decimal places, if you want
            result.set(String.valueOf(percent)); // convert the floating number to a string
        } else {
            result.set("0.00"); 
        }
        context.write(key, result); 
    } 
}

我只计算了好的值，因为在您的下游处理中，您可以 (1 - good) = bad 自己做。

此外，我建议使用 DoubleWritable 作为 Reducer 值而不是 Text

在 Hadoop 中获取百分比

Get a percentage in Hadoop

java

hadoop

mapreduce