MapReduce 中的 Reducer 值需要复制,否则需要修改?
Reducer values in MapReduce need to be copied, otherwise subject to modification?
在 MapReduce 应用程序中,我有一个名为 AnonymousPair 的任意 WritableComparable 实现,我注意到了这一点
import com.google.common.collect.MinMaxPriorityQueue;
public static class MyReducer extends Reducer<LongWritable, AnonymousPair, LongWritable, Text> {
@Override
protected void reduce(LongWritable key, Iterable<AnonymousPair> values, Context context) throws IOException, InterruptedException {
// ...
MinMaxPriorityQueue<AnonymousPair> pQueue = MinMaxPriorityQueue
.orderedBy(new AnonymousPair().comparator())
.maximumSize(Constants.MaxKeywords)
.create();
for(AnonymousPair val : values) {
pQueue.add(new AnonymousPair(val)); // No problem with copy constructor
// pQueue.add(val); // Wrong! Every element in pQueue will be the same
}
}
}
如果我不使用 'copy constructor',pQueue 中的每个元素最终都会相同。谁能帮我理解这个?谢谢!
我的猜测是
- 对可能会被修改的 Reducer Values 元素的引用。它在文档中的某个地方,但我错过了。
- 我错误地使用了 Google Guava MinMaxPriorityQueue
- 或者我的 WritableComparable 实现有问题
我的 AnonymousPair 实现
public static class AnonymousPair implements WritableComparable<AnonymousPair> {
private String a = "";
private Float b = 0f;
public AnonymousPair() {}
public AnonymousPair(String a, Float b) {this.a = a; this.b = b;}
public AnonymousPair(AnonymousPair o) {this.a = o.a; this.b = o.b;}
public Comparator<AnonymousPair> comparator() {return new AnonymousPairComparator();}
class AnonymousPairComparator implements Comparator<AnonymousPair> {
@Override
public int compare(AnonymousPair o1, AnonymousPair o2) {
Float diff = o1.b - o2.b;
if(diff == 0) {
return 0;
}
if(diff < 0) {
return 1; // Reverse order
} else {
return -1;
}
}
}
@Override
public int compareTo(AnonymousPair o) {
int temp = this.a.compareTo(o.a);
if(temp == 0) {
return -this.b.compareTo(o.b);
} else {
return temp;
}
}
// More overriding...
}
见javadoc:
The framework will reuse the key and value objects that are passed
into the reduce, therefore the application should clone the objects
they want to keep a copy of. In many cases, all values are combined
into zero or one value.
在 MapReduce 应用程序中,我有一个名为 AnonymousPair 的任意 WritableComparable 实现,我注意到了这一点
import com.google.common.collect.MinMaxPriorityQueue;
public static class MyReducer extends Reducer<LongWritable, AnonymousPair, LongWritable, Text> {
@Override
protected void reduce(LongWritable key, Iterable<AnonymousPair> values, Context context) throws IOException, InterruptedException {
// ...
MinMaxPriorityQueue<AnonymousPair> pQueue = MinMaxPriorityQueue
.orderedBy(new AnonymousPair().comparator())
.maximumSize(Constants.MaxKeywords)
.create();
for(AnonymousPair val : values) {
pQueue.add(new AnonymousPair(val)); // No problem with copy constructor
// pQueue.add(val); // Wrong! Every element in pQueue will be the same
}
}
}
如果我不使用 'copy constructor',pQueue 中的每个元素最终都会相同。谁能帮我理解这个?谢谢! 我的猜测是
- 对可能会被修改的 Reducer Values 元素的引用。它在文档中的某个地方,但我错过了。
- 我错误地使用了 Google Guava MinMaxPriorityQueue
- 或者我的 WritableComparable 实现有问题
我的 AnonymousPair 实现
public static class AnonymousPair implements WritableComparable<AnonymousPair> {
private String a = "";
private Float b = 0f;
public AnonymousPair() {}
public AnonymousPair(String a, Float b) {this.a = a; this.b = b;}
public AnonymousPair(AnonymousPair o) {this.a = o.a; this.b = o.b;}
public Comparator<AnonymousPair> comparator() {return new AnonymousPairComparator();}
class AnonymousPairComparator implements Comparator<AnonymousPair> {
@Override
public int compare(AnonymousPair o1, AnonymousPair o2) {
Float diff = o1.b - o2.b;
if(diff == 0) {
return 0;
}
if(diff < 0) {
return 1; // Reverse order
} else {
return -1;
}
}
}
@Override
public int compareTo(AnonymousPair o) {
int temp = this.a.compareTo(o.a);
if(temp == 0) {
return -this.b.compareTo(o.b);
} else {
return temp;
}
}
// More overriding...
}
见javadoc:
The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of. In many cases, all values are combined into zero or one value.