Java 中使用 Guava 库的布隆过滤器的交集和并集
Intersections and unions of Bloom filters in Java using the Guava library
布隆过滤器似乎很有希望解决我正在处理的现实问题。一个流行的 java 实现似乎是 Google 的 guava 库。
我需要使用 bloom filter 的并集运算,如果有交集运算就好了,但我可以解决它。
在 java 文档中,我找不到执行交集的方法,而 putAll 方法似乎像并集操作一样工作。
所以我的问题是:
- putAll() 是获取两个布隆过滤器并集的正确方法吗?
- 是否有非反射的方式来获取两个或多个布隆过滤器的交集?
- 万一反射没问题,我们可以在 'data' 字段上安全地执行双向运算(或,和)吗?
如果有人可以推荐任何其他流行且经过良好测试的库,该库在 maven 的存储库中可用,并且具有交集和联合,那也可以解决我的需求。
- Is putAll() the correct way to get the union of two bloom filters?
是的,BloomFilter#putAll(BloomFilter)
JavaDoc 说:
Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the Bloom filters are appropriately sized to avoid saturating them.
问题是它会抛出 IllegalArgumentException - if isCompatible(that) == false
,所以如果你想使用这个方法,你必须确保 isCompatible
returns true:
Determines whether a given Bloom filter is compatible with this Bloom filter. For two Bloom filters to be compatible, they must:
- not be the same instance
- have the same number of hash functions
- have the same bit size
- have the same strategy
- have equal funnels
也就是说,让我们尝试回答第二个问题并找到更简单的解决方案。
- Is there a non-reflective way to get the intersection of two or more bloom filters?
是的,您可以使用 BloomFilter
s 实现 Predicate
并使用其 .and()
和 .or()
方法这一事实,记住 mightContain
行为取决于关于 BloomFilter 的参数。
示例:
// bloomFilter1: [0, 100], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter1 = IntStream.rangeClosed(0, 100)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter2 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 1000, FPP 0.1%
BloomFilter<Integer> bloomFilter3 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 1000L, 0.001d));
assertThat(bloomFilter1.isCompatible(bloomFilter2)).isTrue(); // same parameters
assertThat(bloomFilter1.isCompatible(bloomFilter3)).isFalse(); // different parameters
BloomFilter<Integer> bloomFilterUnion = bloomFilter1.copy(); // preserves BF parameters
bloomFilterUnion.putAll(bloomFilter2);
// note that bloomFilterUnion.putAll(bloomFilter3) would throw IAE
// using AssertJ Predicate asserts https://joel-costigliola.github.io/assertj/core-8/api/org/assertj/core/api/PredicateAssert.html
// or == union == putAll
assertThat(bloomFilterUnion).accepts(90);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(90);
assertThat(bloomFilterUnion).accepts(42);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(42);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(42);
assertThat(bloomFilterUnion).accepts(180);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(180);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(180);
assertThat(bloomFilterUnion).rejects(300);
assertThat(bloomFilter1.or(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.or(bloomFilter3)).rejects(300);
// and == intersection
assertThat(bloomFilter1.and(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter3)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(300);
- In case one is okay with reflection, can we safely perform biwise operations (or, and) on the 'data' field?
我个人不会弄乱任何 class 的内部结构,但在我看来,这里没有必要这样做。
布隆过滤器似乎很有希望解决我正在处理的现实问题。一个流行的 java 实现似乎是 Google 的 guava 库。
我需要使用 bloom filter 的并集运算,如果有交集运算就好了,但我可以解决它。
在 java 文档中,我找不到执行交集的方法,而 putAll 方法似乎像并集操作一样工作。
所以我的问题是:
- putAll() 是获取两个布隆过滤器并集的正确方法吗?
- 是否有非反射的方式来获取两个或多个布隆过滤器的交集?
- 万一反射没问题,我们可以在 'data' 字段上安全地执行双向运算(或,和)吗?
如果有人可以推荐任何其他流行且经过良好测试的库,该库在 maven 的存储库中可用,并且具有交集和联合,那也可以解决我的需求。
- Is putAll() the correct way to get the union of two bloom filters?
是的,BloomFilter#putAll(BloomFilter)
JavaDoc 说:
Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the Bloom filters are appropriately sized to avoid saturating them.
问题是它会抛出 IllegalArgumentException - if isCompatible(that) == false
,所以如果你想使用这个方法,你必须确保 isCompatible
returns true:
Determines whether a given Bloom filter is compatible with this Bloom filter. For two Bloom filters to be compatible, they must:
- not be the same instance
- have the same number of hash functions
- have the same bit size
- have the same strategy
- have equal funnels
也就是说,让我们尝试回答第二个问题并找到更简单的解决方案。
- Is there a non-reflective way to get the intersection of two or more bloom filters?
是的,您可以使用 BloomFilter
s 实现 Predicate
并使用其 .and()
和 .or()
方法这一事实,记住 mightContain
行为取决于关于 BloomFilter 的参数。
示例:
// bloomFilter1: [0, 100], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter1 = IntStream.rangeClosed(0, 100)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter2 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 1000, FPP 0.1%
BloomFilter<Integer> bloomFilter3 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 1000L, 0.001d));
assertThat(bloomFilter1.isCompatible(bloomFilter2)).isTrue(); // same parameters
assertThat(bloomFilter1.isCompatible(bloomFilter3)).isFalse(); // different parameters
BloomFilter<Integer> bloomFilterUnion = bloomFilter1.copy(); // preserves BF parameters
bloomFilterUnion.putAll(bloomFilter2);
// note that bloomFilterUnion.putAll(bloomFilter3) would throw IAE
// using AssertJ Predicate asserts https://joel-costigliola.github.io/assertj/core-8/api/org/assertj/core/api/PredicateAssert.html
// or == union == putAll
assertThat(bloomFilterUnion).accepts(90);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(90);
assertThat(bloomFilterUnion).accepts(42);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(42);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(42);
assertThat(bloomFilterUnion).accepts(180);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(180);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(180);
assertThat(bloomFilterUnion).rejects(300);
assertThat(bloomFilter1.or(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.or(bloomFilter3)).rejects(300);
// and == intersection
assertThat(bloomFilter1.and(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter3)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(300);
- In case one is okay with reflection, can we safely perform biwise operations (or, and) on the 'data' field?
我个人不会弄乱任何 class 的内部结构,但在我看来,这里没有必要这样做。