计算 Multimap 中的重复值

Count duplicate values in Multimap

我有Multimap。示例:

00254=[00255, 2074E, 2074E, 2074E, 2074E, 2074E, 2074E, 00010, 00010, 00010, 0006, 0006, 0006, 00010, R01018, R01018, 0006, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 22636, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00255, 2074E, 2074E, 2074E, 2074E, 2074E]
00256=[00257, 2074E, 2074E, 2074E, 2074E, 00010, 2074E, 2074E, 0006, 00010, 00010, 00010, 0006, 0006, 0006, R01018, R01018, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00257, 2074E, 2074E, 2074E, 2074E, 00010]

我想获取包含重复值的值的个数。

是否可以得到重复的号码?

谢谢。

解决方案使用 Java 8 Stream 获取特定值的出现,只需获取值的 Collection,然后对值进行分组并计数(使用 Collectors函数)得到一个 Map<String, Long> :

Multimap<Integer, String> maps =  ArrayListMultimap.create();
maps.put(1, "foo");
maps.put(1, "bar");
maps.put(1, "foo");
maps.put(2, "Hello");
maps.put(2, "foo");
maps.put(2, "World");
maps.put(2, "World");

Here is the idea to print the occurences per value :

maps.keySet().stream() //Iterate the `keys`
            .map(i -> i + " : " +  //For each key
                        maps.get(i).stream() //stream the values.
                            .collect( //Group and count
                                    Collectors.groupingBy(
                                            Function.identity(), 
                                            Collectors.counting()
                                    )
                            )
            )
            .forEach(System.out::println);

1 : {bar=1, foo=2}

2 : {Hello=1, foo=1, World=2}

这会生成一个String,我让你根据自己的需要进行调整。

计算出现次数是 Multiset 的完美用例,即

A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.

Elements of a multiset that are equal to one another are referred to as occurrences of the same single element.

几乎没有什么不同 Multiset implementations you can choose from, and couple of handy set-like methods in Multisets helper class

在这里,您可以 1) 收集到 multiset 或 2) 使用具有 multiset 值的自定义 multimap。

  1. 您可以收集不可变的多重集,而不是分组映射:

    ImmutableMultiset<String> qtyMultiset = multimap.get(key).stream()
        .collect(ImmutableMultiset.toImmutableMultiset());
    

    或可变的:

    HashMultiset<String> qtyMultiset = multimap.get(key).stream()
        .collect(Multisets.toMultiset(Function.identity(), e -> 1, HashMultiset::create));
    
  2. 或者您可以首先使用自定义多图? (不幸的是,没有任何 MultisetMultimap 接口或实现,因此需要自定义实例):

    Multimap<String, String> countingMultimap
        = Multimaps.newMultimap(new LinkedHashMap<>(), LinkedHashMultiset::create);
    

    如果不需要保留顺序,可以删除 Linked 部分。对于您的数据:

    countingMultimap.putAll("00254", ImmutableList.of("00255", "2074E", "2074E", "2074E", "2074E", "2074E", "2074E", "00010", "00010", "00010", "0006", "0006", "0006", "00010", "R01018", "R01018", "0006", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "22636", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00255", "2074E", "2074E", "2074E", "2074E", "2074E"));
    countingMultimap.putAll("00256", ImmutableList.of("00257", "2074E", "2074E", "2074E", "2074E", "00010", "2074E", "2074E", "0006", "00010", "00010", "00010", "0006", "0006", "0006", "R01018", "R01018", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00257", "2074E", "2074E", "2074E", "2074E", "00010"));
    

    返回的多图将是:

    {00254=[00255 x 2, 2074E x 11, 00010 x 4, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9], 00256=[00257 x 2, 2074E x 10, 00010 x 5, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9]}
    

有关详细信息,请阅读 Guava Wiki page about Multiset(以字数统计为例)。