在 Hadoop 中获取百分比
Get a percentage in Hadoop
我有一个项目,我需要获取一个包含多列的逗号分隔文件并提取公司名称、客户交互的结果以及发生的次数。
然后我需要计算不良交互与良好交互的百分比
我正在使用 Hadoop 和 Java.
我有一个可用的 Map 和 Reduce,它为我提供了公司名称以及有多少好的和坏的交互。
我的问题是,我找不到让 Hadoop 划分好坏给我一个百分比的方法。
大多数公司没有任何不良互动。
这是我的地图
public class TermProjectMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] columb = value.toString().split(",");
String companyName = columb[5];
String companyResponseToConsumer = columb[12];
String lookfor = "closed without relief";
if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {companyResponseToConsumer="Bad";}
else {companyResponseToConsumer="Good";}
//System.out.println(companyResponseToConsumer);
if (companyName != "" && companyResponseToConsumer != "")
{
word.set (companyName + " " + companyResponseToConsumer);
context.write( word, one);
}
}
}
这是我的 Reduce
public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
@Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
if (sum > 0)
{
result.set( sum);
context.write( key, result);
}
}
}
这是我现在得到的例子。
AMERICAN EAGLE MORTGAGE COMPANY,Good, 4
AMERICAN EQUITY MORTGAGE,Good, 26
AMERICAN EXPRESS COMPANY,Bad, 250
AMERICAN EXPRESS COMPANY,Good, 9094
AMERICAN FEDERAL MORTGAGE CORPORATION,Bad, 1
AMERICAN FEDERAL MORTGAGE CORPORATION,Good, 3
AMERICAN FINANCE HOUSE LARIBA,Good, 3
AMERICAN FINANCIAL MORTGAGE COMPANY,Good, 3
为了聚合这些公司,您只需要将它们作为键输出,以便它们在缩减程序中组合。换句话说,您希望在同一个键上同时拥有好值和坏值,而不是像现在这样分开。
我最初认为你可以做 [1, 0]
或 [0, 1]
,但只输出 1
或 -1
而不是 ("GOOD", 1)
和 ("BAD", 1)
会更容易对付。 (以及更高效的 Hadoop 数据传输)
所以,例如,
private final static IntWritable ONE = new IntWritable(1);
private final static IntWritable NEG_ONE = new IntWritable(-1);
...
IntWritable status;
if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {status=NEG_ONE;}
else {status=ONE;}
if (!companyName.isEmpty())
{
word.set (companyName);
context.write(companyName, status);
}
现在,在减速器中,计算值并计算百分比。
public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
@Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int total = 0;
int good_sum = 0;
for (IntWritable val : values)
{
good_sum += (val.get() == 1 ? 1 : 0);
total += 1
}
if (total > 0) // Prevent division by zero
{
double percent = 1.0*good_sum/total;
// Round it to how every many decimal places, if you want
result.set(String.valueOf(percent)); // convert the floating number to a string
} else {
result.set("0.00");
}
context.write(key, result);
}
}
我只计算了好的值,因为在您的下游处理中,您可以 (1 - good) = bad
自己做。
此外,我建议使用 DoubleWritable
作为 Reducer 值而不是 Text
我有一个项目,我需要获取一个包含多列的逗号分隔文件并提取公司名称、客户交互的结果以及发生的次数。
然后我需要计算不良交互与良好交互的百分比
我正在使用 Hadoop 和 Java.
我有一个可用的 Map 和 Reduce,它为我提供了公司名称以及有多少好的和坏的交互。
我的问题是,我找不到让 Hadoop 划分好坏给我一个百分比的方法。
大多数公司没有任何不良互动。
这是我的地图
public class TermProjectMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] columb = value.toString().split(",");
String companyName = columb[5];
String companyResponseToConsumer = columb[12];
String lookfor = "closed without relief";
if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {companyResponseToConsumer="Bad";}
else {companyResponseToConsumer="Good";}
//System.out.println(companyResponseToConsumer);
if (companyName != "" && companyResponseToConsumer != "")
{
word.set (companyName + " " + companyResponseToConsumer);
context.write( word, one);
}
}
}
这是我的 Reduce
public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
@Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
if (sum > 0)
{
result.set( sum);
context.write( key, result);
}
}
}
这是我现在得到的例子。
AMERICAN EAGLE MORTGAGE COMPANY,Good, 4
AMERICAN EQUITY MORTGAGE,Good, 26
AMERICAN EXPRESS COMPANY,Bad, 250
AMERICAN EXPRESS COMPANY,Good, 9094
AMERICAN FEDERAL MORTGAGE CORPORATION,Bad, 1
AMERICAN FEDERAL MORTGAGE CORPORATION,Good, 3
AMERICAN FINANCE HOUSE LARIBA,Good, 3
AMERICAN FINANCIAL MORTGAGE COMPANY,Good, 3
为了聚合这些公司,您只需要将它们作为键输出,以便它们在缩减程序中组合。换句话说,您希望在同一个键上同时拥有好值和坏值,而不是像现在这样分开。
我最初认为你可以做 [1, 0]
或 [0, 1]
,但只输出 1
或 -1
而不是 ("GOOD", 1)
和 ("BAD", 1)
会更容易对付。 (以及更高效的 Hadoop 数据传输)
所以,例如,
private final static IntWritable ONE = new IntWritable(1);
private final static IntWritable NEG_ONE = new IntWritable(-1);
...
IntWritable status;
if (companyResponseToConsumer.toLowerCase().contains(lookfor)) {status=NEG_ONE;}
else {status=ONE;}
if (!companyName.isEmpty())
{
word.set (companyName);
context.write(companyName, status);
}
现在,在减速器中,计算值并计算百分比。
public class TermProjectReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
@Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int total = 0;
int good_sum = 0;
for (IntWritable val : values)
{
good_sum += (val.get() == 1 ? 1 : 0);
total += 1
}
if (total > 0) // Prevent division by zero
{
double percent = 1.0*good_sum/total;
// Round it to how every many decimal places, if you want
result.set(String.valueOf(percent)); // convert the floating number to a string
} else {
result.set("0.00");
}
context.write(key, result);
}
}
我只计算了好的值,因为在您的下游处理中,您可以 (1 - good) = bad
自己做。
此外,我建议使用 DoubleWritable
作为 Reducer 值而不是 Text