Java stream - 根据特定字段查找出现频率最高的元素
Java stream - find most frequent element based on a specific field
我有一个 Person
对象的列表,我想在列表中找到最常见的名称和频率,仅使用 java 流。 (当出现平局时,return 任何结果)
目前,我的解决方案使用 groupingBy
和 counting
,然后再次在生成的地图中找到 max
元素。
当前解决方案对输入进行 2 次传递 (list/map)。
是否有可能使它更有效和更易读?
Person p1 = Person.builder().id("p1").name("Alice").age(1).build();
Person p2 = Person.builder().id("p2").name("Bob").age(2).build();
Person p3 = Person.builder().id("p3").name("Charlie").age(3).build();
Person p4 = Person.builder().id("p4").name("Alice").age(4).build();
List<Person> people = ImmutableList.of(p1, p2, p3, p4);
Map.Entry<String, Long> mostCommonName = people
.stream()
.collect(collectingAndThen(groupingBy(Person::getName, counting()),
map -> map.entrySet().stream().max(Map.Entry.comparingByValue()).orElse(null)
));
System.out.println(mostCommonName); // Alice=2
使用循环和 Map::merge
函数可以立即返回计算出的频率值,将两个通道压缩为一个:
String mostCommonName = null;
int maxFreq = 0;
Map<String, Integer> freq = new HashMap<>();
for (Person p : people) {
if (freq.merge(p.getName(), 1, Integer::sum) > maxFreq) {
maxFreq = freq.get(p.getName());
mostCommonName = p.getName();
}
}
System.out.printf("Most common name '%s' occurred %d times.%n", mostCommonName, maxFreq);
如果您坚持只使用流,那么您最好的选择可能是拥有一个自定义收集器,其中包含在一次传递中聚合所需的信息:
class MaxNameFinder implements Collector<Person, ?, String> {
public class Accumulator {
private final Map<String,Integer> nameFrequency = new HashMap<>();
private int modeFrequency = 0;
private String modeName = null;
public String getModeName() {
return modeName;
}
public void accept(Person person) {
currentFrequency = frequency.merge(p.getName(), 1, Integer::sum);
if (currentFrequency > modeFrequency) {
modeName = person.getName();
modeFrequency = currentFrequency;
}
}
public Accumulator combine(Accumulator other) {
other.frequency.forEach((n, f) -> this.frequency.merge(n, f, Integer::sum));
if (this.frequency.get(other.modeName) > frequency.get(this.modeName))
modeName = other.modeName;
modeFrequency = frequency.get(modeName);
return this;
};
}
public BiConsumer<Accumulator,Person> accumulator() {
return Accumulator::accept;
}
public Set<Collector.Characteristics> characteristics() {
return Set.of(Collector.Characteristics.CONCURRENT);
}
public BinaryOperator<Accumulator> combiner() {
return Accumulator::combine;
}
public Function<Accumulator,String> finisher() {
return Accumulator::getModeName;
}
public Supplier<Accumulator> supplier() {
return Accumulator::new;
}
}
用法为:
people.stream().collect(new MaxNameFinder())
这将 return 一个表示最常见名称的字符串。
我有一个 Person
对象的列表,我想在列表中找到最常见的名称和频率,仅使用 java 流。 (当出现平局时,return 任何结果)
目前,我的解决方案使用 groupingBy
和 counting
,然后再次在生成的地图中找到 max
元素。
当前解决方案对输入进行 2 次传递 (list/map)。
是否有可能使它更有效和更易读?
Person p1 = Person.builder().id("p1").name("Alice").age(1).build();
Person p2 = Person.builder().id("p2").name("Bob").age(2).build();
Person p3 = Person.builder().id("p3").name("Charlie").age(3).build();
Person p4 = Person.builder().id("p4").name("Alice").age(4).build();
List<Person> people = ImmutableList.of(p1, p2, p3, p4);
Map.Entry<String, Long> mostCommonName = people
.stream()
.collect(collectingAndThen(groupingBy(Person::getName, counting()),
map -> map.entrySet().stream().max(Map.Entry.comparingByValue()).orElse(null)
));
System.out.println(mostCommonName); // Alice=2
使用循环和 Map::merge
函数可以立即返回计算出的频率值,将两个通道压缩为一个:
String mostCommonName = null;
int maxFreq = 0;
Map<String, Integer> freq = new HashMap<>();
for (Person p : people) {
if (freq.merge(p.getName(), 1, Integer::sum) > maxFreq) {
maxFreq = freq.get(p.getName());
mostCommonName = p.getName();
}
}
System.out.printf("Most common name '%s' occurred %d times.%n", mostCommonName, maxFreq);
如果您坚持只使用流,那么您最好的选择可能是拥有一个自定义收集器,其中包含在一次传递中聚合所需的信息:
class MaxNameFinder implements Collector<Person, ?, String> {
public class Accumulator {
private final Map<String,Integer> nameFrequency = new HashMap<>();
private int modeFrequency = 0;
private String modeName = null;
public String getModeName() {
return modeName;
}
public void accept(Person person) {
currentFrequency = frequency.merge(p.getName(), 1, Integer::sum);
if (currentFrequency > modeFrequency) {
modeName = person.getName();
modeFrequency = currentFrequency;
}
}
public Accumulator combine(Accumulator other) {
other.frequency.forEach((n, f) -> this.frequency.merge(n, f, Integer::sum));
if (this.frequency.get(other.modeName) > frequency.get(this.modeName))
modeName = other.modeName;
modeFrequency = frequency.get(modeName);
return this;
};
}
public BiConsumer<Accumulator,Person> accumulator() {
return Accumulator::accept;
}
public Set<Collector.Characteristics> characteristics() {
return Set.of(Collector.Characteristics.CONCURRENT);
}
public BinaryOperator<Accumulator> combiner() {
return Accumulator::combine;
}
public Function<Accumulator,String> finisher() {
return Accumulator::getModeName;
}
public Supplier<Accumulator> supplier() {
return Accumulator::new;
}
}
用法为:
people.stream().collect(new MaxNameFinder())
这将 return 一个表示最常见名称的字符串。