Java: 筛选集合并按多个字段检索数据
Java: Filter collection and retrieve data by multiple fields
我有一个 class:
public class Address {
private String country;
private String state;
private String city;
}
还有一个 Person 对象列表。人 class 看起来像:
public class Person {
private String country;
private String state;
private String city;
//other fields
}
我需要筛选 Person
个对象并找到最合适的对象。 Address
对象至少可以有一个非空字段。 Person
对象可以 none,部分或全部提到的字段已初始化。
这是可能的输入示例之一:
Three Person objects:
a. PersonA: country = 'A'
b. PersonB: country = 'A', state = 'B'
c. PersonC: country = 'A', state = 'B', city = 'C'
Address object:
a. Address: country = 'A', state = 'B'
过滤后的预期结果是PersonB。如果只有 PersonA 和 PersonC 对象,那么 PersonA 更可取。
我想展示我是如何尝试做到这一点的,但实际上它是纯粹的蛮力算法,我不喜欢它。算法复杂度随着新增字段的增加而增加。我也考虑过通过谓词使用番石榴过滤器,但不知道谓词应该是什么。
如果除了蛮力之外,这种过滤的首选算法是什么?
据我了解,暴力破解是指检查所有实体的所有字段。好吧,如果您不重构 类,那是不可能的,但是有一个简单的技巧可以提供帮助。它使用 state
模式。
您可以将标记 notNulls
添加到 类:
public class Address {
private int notNulls = 0;
private String country;
private String state;
private String city;
}
public class Person {
private int notNulls = 0;
private String country;
private String state;
private String city;
//other fields
}
我将向您展示一个 setter 的可能实现方式,因为其余的类似:
public void setCountry(String s) {
if (country == null {
if (s != null) {
country = s;
notNulls++;
}
} else {
if (s == null) {
country == null;
notNulls--;
} else {
country = s;
}
}
}
public boolean isValid() {
return notNulls != 0;
}
现在您可以简单地遍历对象。
为避免暴力破解,您需要按地址为您的人员编制索引。为了进行良好的搜索,您肯定需要一个国家(猜测它或以某种方式默认它,否则结果无论如何都会太不准确)。
索引将是一个数字,前 3 位数字代表国家,后 3 位数字代表州,最后 4 位数字代表城市。在这种情况下,您可以在 int 中存储 213 个国家 (only 206 as of 2016),最多包含 999 个州和 9999 个城市。
它使我们能够使用 hashCode 和 TreeSet 来索引您的 Person 实例,并以 O(log(n)) 方式按地址部分查找它们,而无需触及它们的字段。在 TreeSet 构造中会触及字段,您需要添加一些额外的逻辑来修改 Person 以保持索引完整。
指数按每个部分计算,从国家开始
import java.util.HashMap;
import java.util.Map;
public class PartialAddressSearch {
private final static Map<String, AddressPartHolder> COUNTRY_MAP = new HashMap<>(200);
private static class AddressPartHolder {
int id;
Map<String, AddressPartHolder> subPartMap;
public AddressPartHolder(int id, Map<String, AddressPartHolder> subPartMap) {
this.id = id;
this.subPartMap = subPartMap;
}
}
public static int getCountryStateCityHashCode(String country, String state, String city) {
if (country != null && country.length() != 0) {
int result = 0;
AddressPartHolder countryHolder = COUNTRY_MAP.get(country);
if (countryHolder == null) {
countryHolder = new AddressPartHolder(COUNTRY_MAP.size() + 1, new HashMap<>());
COUNTRY_MAP.put(country, countryHolder);
}
result += countryHolder.id * 10000000;
if (state != null) {
AddressPartHolder stateHolder = countryHolder.subPartMap.get(state);
if (stateHolder == null) {
stateHolder = new AddressPartHolder(countryHolder.subPartMap.size() + 1, new HashMap<>());
countryHolder.subPartMap.put(state, stateHolder);
}
result += stateHolder.id * 10000;
if (city != null && city.length() != 0) {
AddressPartHolder cityHolder = stateHolder.subPartMap.get(city);
if (cityHolder == null) {
cityHolder = new AddressPartHolder(stateHolder.subPartMap.size() + 1, null);
stateHolder.subPartMap.put(city, cityHolder);
}
result += cityHolder.id;
}
}
return result;
} else {
throw new IllegalArgumentException("Non-empty country is expected");
}
}
对于您的个人和地址类,您根据 int 的自然顺序定义 hashCode 和 compareTo:
public class Person implements Comparable {
private String country;
private String state;
private String city;
@Override
public boolean equals(Object o) {
//it's important but I removed it for readability
}
@Override
public int hashCode() {
return getCountryStateCityHashCode(country, state, city);
}
@Override
public int compareTo(Object o) {
//could be further improved by storing hashcode in a field to avoid re-calculation on sorting
return hashCode() - o.hashCode();
}
}
public class Address implements Comparable {
private String country;
private String state;
private String city;
@Override
public boolean equals(Object o) {
//removed for readability
}
@Override
public int hashCode() {
return getCountryStateCityHashCode(country, state, city);
}
@Override
public int compareTo(Object o) {
//could be further improved by storing hashcode in a field to avoid re-calculation on sorting
return hashCode() - o.hashCode();
}
}
public class AddressPersonAdapter extends Person {
private final Address delegate;
public AddressPersonAdapter(Address delegate) {
this.delegate = delegate;
}
@Override
public boolean equals(Object o) {
return delegate.equals(o);
}
@Override
public int hashCode() {
return delegate.hashCode();
}
}
之后,您的过滤代码将缩小为填充索引并计算部分地址的下限:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("A");
personSetByAddress.add(personA);
Person personB = new Person();
personB.setCountry("A");
personB.setState("B");
personSetByAddress.add(personB);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10010000, country='A', state='B', city='null'}
如果你没有 PersonB:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("A");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10000000, country='A', state='null', city='null'}
编辑:
需要额外验证的极端情况是在同一国家/地区内没有更大(或更小,如果我们需要上限)元素。例如:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("D");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10000000, country='D', state='null', city='null'}
即我们吵架到最近的国家。要解决这个问题,我们需要检查国家数字是否仍然相同。我们可以通过对 TreeSet 进行子类化并在其中添加此检查来实现:
//we need this class to allow flooring just by id
public class IntegerPersonAdapter extends Person {
private Integer id;
public IntegerPersonAdapter(Integer id) {
this.id = id;
}
@Override
public boolean equals(Object o) {
return id.equals(o);
}
@Override
public int hashCode() {
return id.hashCode();
}
@Override
public int compareTo(Object o) {
return id.hashCode() - o.hashCode();
}
@Override
public String toString() {
return id.toString();
}
}
public class StrictCountryTreeSet extends TreeSet<Person> {
@Override
public Person floor(Person e) {
Person candidate = super.floor(e);
if (candidate != null) {
//we check if the country is the same
int candidateCode = candidate.hashCode();
int eCode = e.hashCode();
if (candidateCode == eCode) {
return candidate;
} else {
int countryCandidate = candidateCode / 10000000;
if (countryCandidate == (eCode / 10000000)) {
//we check if the state is the same
int stateCandidate = candidateCode / 10000;
if (stateCandidate == (eCode / 10000)) {
//we check if is a state
if (candidateCode % 10 == 0) {
return candidate;
} else { //since it's not exact match we haven't found a city - we need to get someone just from state
return this.floor(new IntegerPersonAdapter(stateCandidate * 10000));
}
} else if (stateCandidate % 10 == 0) { //we check if it's a country already
return candidate;
} else {
return this.floor(new IntegerPersonAdapter(countryCandidate * 10000000));
}
}
}
}
return null;
}
现在我们的测试用例会在我们初始化 StrictCountryTreeSet
:
后产生 null
TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
Person personA = new Person();
personA.setCountry("D");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
null
并且对不同状态的测试也会产生 null
:
TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
Person personD = new Person();
personD.setCountry("D");
personSetByAddress.add(personD);
Person personE = new Person();
personE.setCountry("A");
personE.setState("E");
personSetByAddress.add(personE);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressA = new Address();
addressA.setCountry("A");
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
Address addressABC = new Address();
addressABC.setCountry("A");
addressABC.setState("B");
addressABC.setCity("C");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
null
请注意,在这种情况下,您需要将 hashCode 结果存储在 Address 和 Person 类中以避免重新计算。
我有一个 class:
public class Address {
private String country;
private String state;
private String city;
}
还有一个 Person 对象列表。人 class 看起来像:
public class Person {
private String country;
private String state;
private String city;
//other fields
}
我需要筛选 Person
个对象并找到最合适的对象。 Address
对象至少可以有一个非空字段。 Person
对象可以 none,部分或全部提到的字段已初始化。
这是可能的输入示例之一:
Three Person objects:
a. PersonA: country = 'A'
b. PersonB: country = 'A', state = 'B'
c. PersonC: country = 'A', state = 'B', city = 'C'
Address object:
a. Address: country = 'A', state = 'B'
过滤后的预期结果是PersonB。如果只有 PersonA 和 PersonC 对象,那么 PersonA 更可取。
我想展示我是如何尝试做到这一点的,但实际上它是纯粹的蛮力算法,我不喜欢它。算法复杂度随着新增字段的增加而增加。我也考虑过通过谓词使用番石榴过滤器,但不知道谓词应该是什么。
如果除了蛮力之外,这种过滤的首选算法是什么?
据我了解,暴力破解是指检查所有实体的所有字段。好吧,如果您不重构 类,那是不可能的,但是有一个简单的技巧可以提供帮助。它使用 state
模式。
您可以将标记 notNulls
添加到 类:
public class Address {
private int notNulls = 0;
private String country;
private String state;
private String city;
}
public class Person {
private int notNulls = 0;
private String country;
private String state;
private String city;
//other fields
}
我将向您展示一个 setter 的可能实现方式,因为其余的类似:
public void setCountry(String s) {
if (country == null {
if (s != null) {
country = s;
notNulls++;
}
} else {
if (s == null) {
country == null;
notNulls--;
} else {
country = s;
}
}
}
public boolean isValid() {
return notNulls != 0;
}
现在您可以简单地遍历对象。
为避免暴力破解,您需要按地址为您的人员编制索引。为了进行良好的搜索,您肯定需要一个国家(猜测它或以某种方式默认它,否则结果无论如何都会太不准确)。
索引将是一个数字,前 3 位数字代表国家,后 3 位数字代表州,最后 4 位数字代表城市。在这种情况下,您可以在 int 中存储 213 个国家 (only 206 as of 2016),最多包含 999 个州和 9999 个城市。
它使我们能够使用 hashCode 和 TreeSet 来索引您的 Person 实例,并以 O(log(n)) 方式按地址部分查找它们,而无需触及它们的字段。在 TreeSet 构造中会触及字段,您需要添加一些额外的逻辑来修改 Person 以保持索引完整。
指数按每个部分计算,从国家开始
import java.util.HashMap;
import java.util.Map;
public class PartialAddressSearch {
private final static Map<String, AddressPartHolder> COUNTRY_MAP = new HashMap<>(200);
private static class AddressPartHolder {
int id;
Map<String, AddressPartHolder> subPartMap;
public AddressPartHolder(int id, Map<String, AddressPartHolder> subPartMap) {
this.id = id;
this.subPartMap = subPartMap;
}
}
public static int getCountryStateCityHashCode(String country, String state, String city) {
if (country != null && country.length() != 0) {
int result = 0;
AddressPartHolder countryHolder = COUNTRY_MAP.get(country);
if (countryHolder == null) {
countryHolder = new AddressPartHolder(COUNTRY_MAP.size() + 1, new HashMap<>());
COUNTRY_MAP.put(country, countryHolder);
}
result += countryHolder.id * 10000000;
if (state != null) {
AddressPartHolder stateHolder = countryHolder.subPartMap.get(state);
if (stateHolder == null) {
stateHolder = new AddressPartHolder(countryHolder.subPartMap.size() + 1, new HashMap<>());
countryHolder.subPartMap.put(state, stateHolder);
}
result += stateHolder.id * 10000;
if (city != null && city.length() != 0) {
AddressPartHolder cityHolder = stateHolder.subPartMap.get(city);
if (cityHolder == null) {
cityHolder = new AddressPartHolder(stateHolder.subPartMap.size() + 1, null);
stateHolder.subPartMap.put(city, cityHolder);
}
result += cityHolder.id;
}
}
return result;
} else {
throw new IllegalArgumentException("Non-empty country is expected");
}
}
对于您的个人和地址类,您根据 int 的自然顺序定义 hashCode 和 compareTo:
public class Person implements Comparable {
private String country;
private String state;
private String city;
@Override
public boolean equals(Object o) {
//it's important but I removed it for readability
}
@Override
public int hashCode() {
return getCountryStateCityHashCode(country, state, city);
}
@Override
public int compareTo(Object o) {
//could be further improved by storing hashcode in a field to avoid re-calculation on sorting
return hashCode() - o.hashCode();
}
}
public class Address implements Comparable {
private String country;
private String state;
private String city;
@Override
public boolean equals(Object o) {
//removed for readability
}
@Override
public int hashCode() {
return getCountryStateCityHashCode(country, state, city);
}
@Override
public int compareTo(Object o) {
//could be further improved by storing hashcode in a field to avoid re-calculation on sorting
return hashCode() - o.hashCode();
}
}
public class AddressPersonAdapter extends Person {
private final Address delegate;
public AddressPersonAdapter(Address delegate) {
this.delegate = delegate;
}
@Override
public boolean equals(Object o) {
return delegate.equals(o);
}
@Override
public int hashCode() {
return delegate.hashCode();
}
}
之后,您的过滤代码将缩小为填充索引并计算部分地址的下限:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("A");
personSetByAddress.add(personA);
Person personB = new Person();
personB.setCountry("A");
personB.setState("B");
personSetByAddress.add(personB);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10010000, country='A', state='B', city='null'}
如果你没有 PersonB:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("A");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10000000, country='A', state='null', city='null'}
编辑:
需要额外验证的极端情况是在同一国家/地区内没有更大(或更小,如果我们需要上限)元素。例如:
TreeSet<Person> personSetByAddress = new TreeSet<>();
Person personA = new Person();
personA.setCountry("D");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
Person{hashCode=10000000, country='D', state='null', city='null'}
即我们吵架到最近的国家。要解决这个问题,我们需要检查国家数字是否仍然相同。我们可以通过对 TreeSet 进行子类化并在其中添加此检查来实现:
//we need this class to allow flooring just by id
public class IntegerPersonAdapter extends Person {
private Integer id;
public IntegerPersonAdapter(Integer id) {
this.id = id;
}
@Override
public boolean equals(Object o) {
return id.equals(o);
}
@Override
public int hashCode() {
return id.hashCode();
}
@Override
public int compareTo(Object o) {
return id.hashCode() - o.hashCode();
}
@Override
public String toString() {
return id.toString();
}
}
public class StrictCountryTreeSet extends TreeSet<Person> {
@Override
public Person floor(Person e) {
Person candidate = super.floor(e);
if (candidate != null) {
//we check if the country is the same
int candidateCode = candidate.hashCode();
int eCode = e.hashCode();
if (candidateCode == eCode) {
return candidate;
} else {
int countryCandidate = candidateCode / 10000000;
if (countryCandidate == (eCode / 10000000)) {
//we check if the state is the same
int stateCandidate = candidateCode / 10000;
if (stateCandidate == (eCode / 10000)) {
//we check if is a state
if (candidateCode % 10 == 0) {
return candidate;
} else { //since it's not exact match we haven't found a city - we need to get someone just from state
return this.floor(new IntegerPersonAdapter(stateCandidate * 10000));
}
} else if (stateCandidate % 10 == 0) { //we check if it's a country already
return candidate;
} else {
return this.floor(new IntegerPersonAdapter(countryCandidate * 10000000));
}
}
}
}
return null;
}
现在我们的测试用例会在我们初始化 StrictCountryTreeSet
:
null
TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
Person personA = new Person();
personA.setCountry("D");
personSetByAddress.add(personA);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
null
并且对不同状态的测试也会产生 null
:
TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
Person personD = new Person();
personD.setCountry("D");
personSetByAddress.add(personD);
Person personE = new Person();
personE.setCountry("A");
personE.setState("E");
personSetByAddress.add(personE);
Person personC = new Person();
personC.setCountry("A");
personC.setState("B");
personC.setCity("C");
personSetByAddress.add(personC);
Address addressA = new Address();
addressA.setCountry("A");
Address addressAB = new Address();
addressAB.setCountry("A");
addressAB.setState("B");
Address addressABC = new Address();
addressABC.setCountry("A");
addressABC.setState("B");
addressABC.setCity("C");
System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
Yields:
null
请注意,在这种情况下,您需要将 hashCode 结果存储在 Address 和 Person 类中以避免重新计算。