计算 Multimap 中的重复值
Count duplicate values in Multimap
我有Multimap。示例:
00254=[00255, 2074E, 2074E, 2074E, 2074E, 2074E, 2074E, 00010, 00010, 00010, 0006, 0006, 0006, 00010, R01018, R01018, 0006, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 22636, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00255, 2074E, 2074E, 2074E, 2074E, 2074E]
00256=[00257, 2074E, 2074E, 2074E, 2074E, 00010, 2074E, 2074E, 0006, 00010, 00010, 00010, 0006, 0006, 0006, R01018, R01018, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00257, 2074E, 2074E, 2074E, 2074E, 00010]
我想获取包含重复值的值的个数。
- 00254=[00255:2, 2074E:11, 00010:4, 0006:5, R01018:6, ...]
- 00256=[00257:2, 2074E:10, 00010:5, 0006:5, R01018:7, ...]
是否可以得到重复的号码?
谢谢。
解决方案使用 Java 8 Stream 获取特定值的出现,只需获取值的 Collection
,然后对值进行分组并计数(使用 Collectors
函数)得到一个 Map<String, Long>
:
Multimap<Integer, String> maps = ArrayListMultimap.create();
maps.put(1, "foo");
maps.put(1, "bar");
maps.put(1, "foo");
maps.put(2, "Hello");
maps.put(2, "foo");
maps.put(2, "World");
maps.put(2, "World");
Here is the idea to print the occurences per value :
maps.keySet().stream() //Iterate the `keys`
.map(i -> i + " : " + //For each key
maps.get(i).stream() //stream the values.
.collect( //Group and count
Collectors.groupingBy(
Function.identity(),
Collectors.counting()
)
)
)
.forEach(System.out::println);
1 : {bar=1, foo=2}
2 : {Hello=1, foo=1, World=2}
这会生成一个String
,我让你根据自己的需要进行调整。
计算出现次数是 Multiset
的完美用例,即
A collection that supports order-independent equality, like Set
, but may have duplicate elements. A multiset is also sometimes called a bag.
Elements of a multiset that are equal to one another are referred to as occurrences of the same single element.
几乎没有什么不同 Multiset
implementations you can choose from, and couple of handy set-like methods in Multisets
helper class。
在这里,您可以 1) 收集到 multiset 或 2) 使用具有 multiset 值的自定义 multimap。
您可以收集不可变的多重集,而不是分组映射:
ImmutableMultiset<String> qtyMultiset = multimap.get(key).stream()
.collect(ImmutableMultiset.toImmutableMultiset());
或可变的:
HashMultiset<String> qtyMultiset = multimap.get(key).stream()
.collect(Multisets.toMultiset(Function.identity(), e -> 1, HashMultiset::create));
或者您可以首先使用自定义多图? (不幸的是,没有任何 MultisetMultimap
接口或实现,因此需要自定义实例):
Multimap<String, String> countingMultimap
= Multimaps.newMultimap(new LinkedHashMap<>(), LinkedHashMultiset::create);
如果不需要保留顺序,可以删除 Linked
部分。对于您的数据:
countingMultimap.putAll("00254", ImmutableList.of("00255", "2074E", "2074E", "2074E", "2074E", "2074E", "2074E", "00010", "00010", "00010", "0006", "0006", "0006", "00010", "R01018", "R01018", "0006", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "22636", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00255", "2074E", "2074E", "2074E", "2074E", "2074E"));
countingMultimap.putAll("00256", ImmutableList.of("00257", "2074E", "2074E", "2074E", "2074E", "00010", "2074E", "2074E", "0006", "00010", "00010", "00010", "0006", "0006", "0006", "R01018", "R01018", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00257", "2074E", "2074E", "2074E", "2074E", "00010"));
返回的多图将是:
{00254=[00255 x 2, 2074E x 11, 00010 x 4, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9], 00256=[00257 x 2, 2074E x 10, 00010 x 5, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9]}
有关详细信息,请阅读 Guava Wiki page about Multiset
(以字数统计为例)。
我有Multimap。示例:
00254=[00255, 2074E, 2074E, 2074E, 2074E, 2074E, 2074E, 00010, 00010, 00010, 0006, 0006, 0006, 00010, R01018, R01018, 0006, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 22636, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00255, 2074E, 2074E, 2074E, 2074E, 2074E]
00256=[00257, 2074E, 2074E, 2074E, 2074E, 00010, 2074E, 2074E, 0006, 00010, 00010, 00010, 0006, 0006, 0006, R01018, R01018, 0006, R01018, R01018, R01018, 12062, S2202962, S2202962, R01018, 12062, 20466, 12062, 20466, 20466, 20466, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 22636, 00257, 2074E, 2074E, 2074E, 2074E, 00010]
我想获取包含重复值的值的个数。
- 00254=[00255:2, 2074E:11, 00010:4, 0006:5, R01018:6, ...]
- 00256=[00257:2, 2074E:10, 00010:5, 0006:5, R01018:7, ...]
是否可以得到重复的号码?
谢谢。
解决方案使用 Java 8 Stream 获取特定值的出现,只需获取值的 Collection
,然后对值进行分组并计数(使用 Collectors
函数)得到一个 Map<String, Long>
:
Multimap<Integer, String> maps = ArrayListMultimap.create();
maps.put(1, "foo");
maps.put(1, "bar");
maps.put(1, "foo");
maps.put(2, "Hello");
maps.put(2, "foo");
maps.put(2, "World");
maps.put(2, "World");
Here is the idea to print the occurences per value :
maps.keySet().stream() //Iterate the `keys`
.map(i -> i + " : " + //For each key
maps.get(i).stream() //stream the values.
.collect( //Group and count
Collectors.groupingBy(
Function.identity(),
Collectors.counting()
)
)
)
.forEach(System.out::println);
1 : {bar=1, foo=2}
2 : {Hello=1, foo=1, World=2}
这会生成一个String
,我让你根据自己的需要进行调整。
计算出现次数是 Multiset
的完美用例,即
A collection that supports order-independent equality, like
Set
, but may have duplicate elements. A multiset is also sometimes called a bag.Elements of a multiset that are equal to one another are referred to as occurrences of the same single element.
几乎没有什么不同 Multiset
implementations you can choose from, and couple of handy set-like methods in Multisets
helper class。
在这里,您可以 1) 收集到 multiset 或 2) 使用具有 multiset 值的自定义 multimap。
您可以收集不可变的多重集,而不是分组映射:
ImmutableMultiset<String> qtyMultiset = multimap.get(key).stream() .collect(ImmutableMultiset.toImmutableMultiset());
或可变的:
HashMultiset<String> qtyMultiset = multimap.get(key).stream() .collect(Multisets.toMultiset(Function.identity(), e -> 1, HashMultiset::create));
或者您可以首先使用自定义多图? (不幸的是,没有任何
MultisetMultimap
接口或实现,因此需要自定义实例):Multimap<String, String> countingMultimap = Multimaps.newMultimap(new LinkedHashMap<>(), LinkedHashMultiset::create);
如果不需要保留顺序,可以删除
Linked
部分。对于您的数据:countingMultimap.putAll("00254", ImmutableList.of("00255", "2074E", "2074E", "2074E", "2074E", "2074E", "2074E", "00010", "00010", "00010", "0006", "0006", "0006", "00010", "R01018", "R01018", "0006", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "22636", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00255", "2074E", "2074E", "2074E", "2074E", "2074E")); countingMultimap.putAll("00256", ImmutableList.of("00257", "2074E", "2074E", "2074E", "2074E", "00010", "2074E", "2074E", "0006", "00010", "00010", "00010", "0006", "0006", "0006", "R01018", "R01018", "0006", "R01018", "R01018", "R01018", "12062", "S2202962", "S2202962", "R01018", "12062", "20466", "12062", "20466", "20466", "20466", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "22636", "00257", "2074E", "2074E", "2074E", "2074E", "00010"));
返回的多图将是:
{00254=[00255 x 2, 2074E x 11, 00010 x 4, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9], 00256=[00257 x 2, 2074E x 10, 00010 x 5, 0006 x 5, R01018 x 6, 12062 x 3, S2202962 x 2, 20466 x 4, 22636 x 9]}
有关详细信息,请阅读 Guava Wiki page about Multiset
(以字数统计为例)。