Java 8 Stream，如何获取Top N count？

Question

我需要您的建议来简化下面的代码。我有一个玩家列表，其中包含获胜游戏的 ID。我想从这个列表中提取 2 名最佳球员（这 2 名球员拥有更多比赛 ID）一旦提取出来，我就得对return初始列表做其他操作。我认为可以在优化或阅读方面改进这段代码。如果你能帮助我。

public class PlayerStatistics {
    int id
    String name;
    int idMatchWon; // key from Match

    // getter , setter
}

public static void main(String[] args) throws Exception {

    List<PlayerStatistics> _players = new ArrayList<PlayerStatistics>();

    _players.add(initialize(1,'John',4));
    _players.add(initialize(2,'Teddy',2));
    _players.add(initialize(3,'Kevin',3));

    // How to get Top 2
    List<PlayerStatistics> _top2Players = extractTop2Players(_players);
}

private List<PlayerStatistics> extractTop2Players (List<PlayerStatistics> _list) {

    List<PlayerStatistics> _topPlayers = new ArrayList<PlayerStatistics>();

    // 1. Group and count 
    Map<String, Long> _players = _list
            .stream()
            .filter(x -> (!"".equals(x.getName()) && x.getName()!= null) )
            .collect(
                    Collectors.groupingBy(
                            PlayerStatistics::getName, Collectors.counting()
                    )
            );
    ;

    // 2 Best Palyers
    Set<String> _sortedPlayers = _players.entrySet().stream()
            .sorted(Map.Entry.comparingByValue(Collections.reverseOrder()))
            .limit(2)
            .map(Entry::getKey)
            .collect(Collectors.toSet())
    ;

    // 3. Rebuild list 
    _topPlayers = _list
            .stream()
            .filter(x -> _sortedPlayers.contains(x.getName()))
            .collect(Collectors.toList())
    ;

    return _topPlayers;
}


private PlayerStatistics initialize (int id, String name, int year, int month, int won, int lost) {
    return 
        new PlayerStatistics()
            .withId(id)
            .withName(name)
            .withIdMatchWon(won)
        );
}

Answer 1

首先，让我们声明您的代码是绝对正确的。它完成了需要完成的工作，甚至通过使用集进行了优化。不过，它可以通过两种方式进一步改进：

时间复杂度： 您正在对整个数据集进行排序，其时间复杂度为 O(mlogm)，m 是初始玩家列表的大小。立即，您将使用列表的前 N 个元素，其中 N << m.

下面我展示了一种将算法的时间复杂度提高到 O(mlogN) 的方法，这意味着在您的特定情况下它将变为 O(m)（这是因为 N=2 , 所以 logN=log2=1).
您正在遍历数据集 3 次：首先您正在迭代玩家列表以创建计数图，然后您正在迭代此图以获得顶部 N 玩家，最后你再次迭代玩家列表以检查每个玩家是否属于前 N 玩家的集合。

这可以改进为仅对数据集执行 2 遍：第一个创建计数图（类似于您已经完成的），另一个创建一个仅保留top N 个元素，按计数降序排序，遍历完成后返回结果。

重要提示：下面的解决方案要求您的 PlayerStatistics class 始终如一地实施 hashCode 和 equals 方法。

首先，我们有一个通用方法 topN，它（毫不奇怪）从任何给定的地图中提取顶部 N 元素。它通过按值比较条目来做到这一点，降序（在这个版本中，值 V 必须是 Comparable<V>，但是这个算法可以很容易地扩展到支持不实现 Comparable<V> 的值通过提供自定义 Comparator<V>):

public static 
<K, V extends Comparable<? super V>, T extends Comparable<? super T>>
Collection<K> 
topN(
        Map<K, V> map, 
        int N,
        Function<? super K, ? extends T> tieBreaker) {

    TreeMap<Map.Entry<K, V>, K> topN = new TreeMap<>(
        Map.Entry.<K, V>comparingByValue()      // by value descending, then by key
            .reversed()                         // to allow entries with duplicate values
            .thenComparing(e -> tieBreaker.apply(e.getKey())));

    map.entrySet().forEach(e -> {
      topN.put(e, e.getKey());
      if (topN.size() > N) topN.pollLastEntry();
    });

    return topN.values();
}

这里是topN TreeMap behaves as a priority queue of size N (though we add up to N+1 elements). First we put the entry into the topN map, then, if the map has more than N entries, we immediately invoke the pollLastEntry 方法，删除优先级最低的条目（根据TreeMap 的键的顺序）。这保证在遍历时，topN 映射将仅包含已排序的顶部 N 条目。

请注意，我使用的比较器首先按值 V 对 TreeMap<Map.Entry<K, V>, K> 进行降序排序，然后按键 K 进行排序。这是在 Function<? super K, ? extends T> tieBreaker 函数的帮助下实现的，该函数将每个键 K 转换为必须为 Comparable<T> 的值 T。所有这些都允许映射包含具有重复值 V 的条目，而不需要键 K 也为 Comparable<K>.

最后，您将使用上述方法如下：

Map<PlayerStatistics, Long> counts = yourInitialListOfPlayers.stream()
    .filter(x -> !"".equals(x.getName()) && x.getName() != null)
    .collect(Collectors.groupingBy(x -> x, Collectors.counting()));

Collection<PlayerStatistics> top2 = topN(counts, 2, PlayerStatistics::getName);

Java 8 Stream，如何获取Top N count？

Java 8 Stream, How to get Top N count?

java

sorting

collect

java-8

java-stream