Java 中用于客户搜索的 Radix(Trie) 树实现
Radix(Trie) Tree implementation for Cutomer search in Java
我正在做一个项目,需要在数百万客户的数据中进行搜索。我想实现基数(特里)搜索算法。我已经阅读并实现了一个简单的字符串集合的基数。但是这里我有一个客户集合,想按姓名或手机号码搜索。
客户Class:
public class Customer {
String name;
String mobileNumer;
public Customer (String name, String phoneNumer) {
this.name = name;
this.mobileNumer = phoneNumer;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getPhoneNumer() {
return mobileNumer;
}
public void setPhoneNumer(String phoneNumer) {
this.mobileNumer = phoneNumer;
}
}
RadixNode Class:
import java.util.HashMap;
import java.util.Map;
class RadixNode {
private final Map<Character, RadixNode> child = new HashMap<>();
private final Map<Customer, RadixNode> mobileNum = new HashMap<>();
private boolean endOfWord;
Map<Character, RadixNode> getChild() {
return child;
}
Map<Customer, RadixNode> getChildPhoneDir() {
return mobileNum;
}
boolean isEndOfWord() {
return endOfWord;
}
void setEndOfWord(boolean endOfWord) {
this.endOfWord = endOfWord;
}
}
基数Class:
class Radix {
private RadixNode root;
Radix() {
root = new RadixNode();
}
void insert(String word) {
RadixNode current = root;
for (int i = 0; i < word.length(); i++) {
current = current.getChild().computeIfAbsent(word.charAt(i), c -> new RadixNode());
}
current.setEndOfWord(true);
}
void insert(Customer word) {
RadixNode current = root;
System.out.println("==========================================");
System.out.println(word.mobileNumer.length());
for (int i = 0; i < word.mobileNumer.length(); i++) {
current = current.getChildPhoneDir().computeIfAbsent(word.mobileNumer.charAt(i), c -> new RadixNode());
System.out.println(current);
}
current.setEndOfWord(true);
}
boolean delete(String word) {
return delete(root, word, 0);
}
boolean containsNode(String word) {
RadixNode current = root;
for (int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
RadixNode node = current.getChild().get(ch);
if (node == null) {
return false;
}
current = node;
}
return current.isEndOfWord();
}
boolean isEmpty() {
return root == null;
}
private boolean delete(RadixNode current, String word, int index) {
if (index == word.length()) {
if (!current.isEndOfWord()) {
return false;
}
current.setEndOfWord(false);
return current.getChild().isEmpty();
}
char ch = word.charAt(index);
RadixNode node = current.getChild().get(ch);
if (node == null) {
return false;
}
boolean shouldDeleteCurrentNode = delete(node, word, index + 1) && !node.isEndOfWord();
if (shouldDeleteCurrentNode) {
current.getChild().remove(ch);
return current.getChild().isEmpty();
}
return false;
}
public void displayContactsUtil(RadixNode curNode, String prefix)
{
// Check if the string 'prefix' ends at this Node
// If yes then display the string found so far
if (curNode.isEndOfWord())
System.out.println(prefix);
// Find all the adjacent Nodes to the current
// Node and then call the function recursively
// This is similar to performing DFS on a graph
for (char i = 'a'; i <= 'z'; i++)
{
RadixNode nextNode = curNode.getChild().get(i);
if (nextNode != null)
{
displayContactsUtil(nextNode, prefix + i);
}
}
}
public boolean displayContacts(String str)
{
RadixNode prevNode = root;
// 'flag' denotes whether the string entered
// so far is present in the Contact List
String prefix = "";
int len = str.length();
// Display the contact List for string formed
// after entering every character
int i;
for (i = 0; i < len; i++)
{
// 'str' stores the string entered so far
prefix += str.charAt(i);
// Get the last character entered
char lastChar = prefix.charAt(i);
// Find the Node corresponding to the last
// character of 'str' which is pointed by
// prevNode of the Trie
RadixNode curNode = prevNode.getChild().get(lastChar);
// If nothing found, then break the loop as
// no more prefixes are going to be present.
if (curNode == null)
{
System.out.println("No Results Found for \"" + prefix + "\"");
i++;
break;
}
// If present in trie then display all
// the contacts with given prefix.
System.out.println("Suggestions based on \"" + prefix + "\" are");
displayContactsUtil(curNode, prefix);
// Change prevNode for next prefix
prevNode = curNode;
}
for ( ; i < len; i++)
{
prefix += str.charAt(i);
System.out.println("No Results Found for \"" + prefix + "\"");
}
return true;
}
public void displayContactsUtil(RadixNode curNode, String prefix, boolean isPhoneNumber)
{
// Check if the string 'prefix' ends at this Node
// If yes then display the string found so far
if (curNode.isEndOfWord())
System.out.println(prefix);
// Find all the adjacent Nodes to the current
// Node and then call the function recursively
// This is similar to performing DFS on a graph
for (char i = '0'; i <= '9'; i++)
{
RadixNode nextNode = curNode.getChildPhoneDir().get(i);
if (nextNode != null)
{
displayContactsUtil(nextNode, prefix + i);
}
}
}
public boolean displayContacts(String str, boolean isPhoneNumber)
{
RadixNode prevNode = root;
// 'flag' denotes whether the string entered
// so far is present in the Contact List
String prefix = "";
int len = str.length();
// Display the contact List for string formed
// after entering every character
int i;
for (i = 0; i < len; i++)
{
// 'str' stores the string entered so far
prefix += str.charAt(i);
// Get the last character entered
char lastChar = prefix.charAt(i);
// Find the Node corresponding to the last
// character of 'str' which is pointed by
// prevNode of the Trie
RadixNode curNode = prevNode.getChildPhoneDir().get(lastChar);
// If nothing found, then break the loop as
// no more prefixes are going to be present.
if (curNode == null)
{
System.out.println("No Results Found for \"" + prefix + "\"");
i++;
break;
}
// If present in trie then display all
// the contacts with given prefix.
System.out.println("Suggestions based on \"" + prefix + "\" are");
displayContactsUtil(curNode, prefix, isPhoneNumber);
// Change prevNode for next prefix
prevNode = curNode;
}
for ( ; i < len; i++)
{
prefix += str.charAt(i);
System.out.println("No Results Found for \"" + prefix + "\"");
}
return true;
}
}
我尝试在集合中搜索但卡住了。任何帮助/建议将不胜感激。
我建议您使用两种方法。
第一种方式:使用单个 trie。
可以在单个 trie 中存储您需要的所有内容。您的客户 class 很好,这是一个可能的 RadixNode
实现。
我认为不能有两个同名的客户,或者同一个 phone 号码。如果不是这种情况(例如,可能会有同名和不同 phone nb 的人)在评论中告诉我,我会编辑。
需要了解的重要一点是,如果您想要使用两种不同的方式来查找客户,并且您使用一个 trie,则每个客户将在您的 trie 中出现两次。一次在其名称对应的路径末尾,一次在其phone号对应的路径末尾后。
import java.util.HashMap;
import java.util.Map;
class RadixNode {
private Map<Character, RadixNode> children;
private Customer customer;
public RadixNode(){
this.children = new Map<Character, RadixNode>();
this.Customer = NULL;
}
Map<Character, RadixNode> getChildren() {
return children;
}
boolean hasCustomer() {
return this.customer != NULL;
}
Customer getCustomer() {
return customer;
}
void setCustomer(Customer customer) {
this.customer = customer;
}
}
如您所见,只有一张地图存储节点的子节点。那是因为我们可以看到一个 phone 数字作为一串数字,所以这个 trie 将存储所有的客户......两次。每个名字一次,每个 phone 个号码一次。
现在让我们看一个插入函数。你的 trie 将需要一个根,我们称它为 root
.
public void insert(RadixNode root, Customer customer){
insert_with_name(root, customer, 0);
insert_with_phone_nb(root, customer, 0);
}
public void insert_with_name(RadixNode node, Customer customer, int idx){
if (idx == customer.getName().length()){
node.setCustomer(customer);
} else {
Character current_char = customer.getName().chatAt(idx);
if (! node.getChlidren().containsKey(current_char){
RadixNode new_child = new RadixNode();
node.getChildren().put(current_char, new_child);
}
insert_with_name(node.getChildren().get(current_char), customer, idx+1);
}
}
insert_with_phone_nb()
方法类似。只要人们有唯一的名字、唯一的 phone 号码,并且某人的名字不能是某人的 phone 号码,这就会起作用。
如您所见,该方法是递归的。我建议您递归地构建您的 trie 结构(通常,一切都基于树结构),因为它使代码更简单、更简洁。
搜索功能几乎是插入功能的复制粘贴:
public void search_by_name(RadixNode node, String name, int idx){
// returns NULL if there is no user going by that name
if (idx == name.length()){
return node.getCustomer();
} else {
Character current_char = name.chatAt(idx);
if (! node.getChlidren().containsKey(current_char){
return NULL;
} else {
return search_by_name(node.getChildren().get(current_char), name, idx+1);
}
}
}
第二种方式:尝试2次
原理是一样的,你所要做的就是重用上面的代码,但保留两个不同的 root
节点,每个节点将构建一个 trie(一个用于名称,一个用于 phone 数字)。
唯一的区别是 insert
函数(因为它会用 2 个不同的根调用 insert_with_name
和 insert_with_phone_nb
),以及必须在正确的 trie 中搜索的搜索函数。
public void insert(RadixNode root_name_trie, RadixNode root_phone_trie, Customer customer){
insert_with_name(root_name_trie, customer, 0);
insert_with_phone_nb(root_phone_trie, customer, 0);
}
编辑: 评论精确后可能会有同名的客户,这里有一个替代实现,允许RadixNode
包含对多个[=的引用22=].
例如,将 RadixNode
中的 Customer customer
属性替换为 Vector<Customer>
。当然,这些方法必须相应地修改,然后按名称搜索 return 给你一个客户向量(可能是空的),因为这个搜索然后可以导致几个结果。
在您的情况下,我会选择一个包含客户向量的 trie。因此,您可以同时按名称和 phone(将数字转换为 String
)进行搜索,并维护一个数据结构。
我正在做一个项目,需要在数百万客户的数据中进行搜索。我想实现基数(特里)搜索算法。我已经阅读并实现了一个简单的字符串集合的基数。但是这里我有一个客户集合,想按姓名或手机号码搜索。
客户Class:
public class Customer {
String name;
String mobileNumer;
public Customer (String name, String phoneNumer) {
this.name = name;
this.mobileNumer = phoneNumer;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getPhoneNumer() {
return mobileNumer;
}
public void setPhoneNumer(String phoneNumer) {
this.mobileNumer = phoneNumer;
}
}
RadixNode Class:
import java.util.HashMap;
import java.util.Map;
class RadixNode {
private final Map<Character, RadixNode> child = new HashMap<>();
private final Map<Customer, RadixNode> mobileNum = new HashMap<>();
private boolean endOfWord;
Map<Character, RadixNode> getChild() {
return child;
}
Map<Customer, RadixNode> getChildPhoneDir() {
return mobileNum;
}
boolean isEndOfWord() {
return endOfWord;
}
void setEndOfWord(boolean endOfWord) {
this.endOfWord = endOfWord;
}
}
基数Class:
class Radix {
private RadixNode root;
Radix() {
root = new RadixNode();
}
void insert(String word) {
RadixNode current = root;
for (int i = 0; i < word.length(); i++) {
current = current.getChild().computeIfAbsent(word.charAt(i), c -> new RadixNode());
}
current.setEndOfWord(true);
}
void insert(Customer word) {
RadixNode current = root;
System.out.println("==========================================");
System.out.println(word.mobileNumer.length());
for (int i = 0; i < word.mobileNumer.length(); i++) {
current = current.getChildPhoneDir().computeIfAbsent(word.mobileNumer.charAt(i), c -> new RadixNode());
System.out.println(current);
}
current.setEndOfWord(true);
}
boolean delete(String word) {
return delete(root, word, 0);
}
boolean containsNode(String word) {
RadixNode current = root;
for (int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
RadixNode node = current.getChild().get(ch);
if (node == null) {
return false;
}
current = node;
}
return current.isEndOfWord();
}
boolean isEmpty() {
return root == null;
}
private boolean delete(RadixNode current, String word, int index) {
if (index == word.length()) {
if (!current.isEndOfWord()) {
return false;
}
current.setEndOfWord(false);
return current.getChild().isEmpty();
}
char ch = word.charAt(index);
RadixNode node = current.getChild().get(ch);
if (node == null) {
return false;
}
boolean shouldDeleteCurrentNode = delete(node, word, index + 1) && !node.isEndOfWord();
if (shouldDeleteCurrentNode) {
current.getChild().remove(ch);
return current.getChild().isEmpty();
}
return false;
}
public void displayContactsUtil(RadixNode curNode, String prefix)
{
// Check if the string 'prefix' ends at this Node
// If yes then display the string found so far
if (curNode.isEndOfWord())
System.out.println(prefix);
// Find all the adjacent Nodes to the current
// Node and then call the function recursively
// This is similar to performing DFS on a graph
for (char i = 'a'; i <= 'z'; i++)
{
RadixNode nextNode = curNode.getChild().get(i);
if (nextNode != null)
{
displayContactsUtil(nextNode, prefix + i);
}
}
}
public boolean displayContacts(String str)
{
RadixNode prevNode = root;
// 'flag' denotes whether the string entered
// so far is present in the Contact List
String prefix = "";
int len = str.length();
// Display the contact List for string formed
// after entering every character
int i;
for (i = 0; i < len; i++)
{
// 'str' stores the string entered so far
prefix += str.charAt(i);
// Get the last character entered
char lastChar = prefix.charAt(i);
// Find the Node corresponding to the last
// character of 'str' which is pointed by
// prevNode of the Trie
RadixNode curNode = prevNode.getChild().get(lastChar);
// If nothing found, then break the loop as
// no more prefixes are going to be present.
if (curNode == null)
{
System.out.println("No Results Found for \"" + prefix + "\"");
i++;
break;
}
// If present in trie then display all
// the contacts with given prefix.
System.out.println("Suggestions based on \"" + prefix + "\" are");
displayContactsUtil(curNode, prefix);
// Change prevNode for next prefix
prevNode = curNode;
}
for ( ; i < len; i++)
{
prefix += str.charAt(i);
System.out.println("No Results Found for \"" + prefix + "\"");
}
return true;
}
public void displayContactsUtil(RadixNode curNode, String prefix, boolean isPhoneNumber)
{
// Check if the string 'prefix' ends at this Node
// If yes then display the string found so far
if (curNode.isEndOfWord())
System.out.println(prefix);
// Find all the adjacent Nodes to the current
// Node and then call the function recursively
// This is similar to performing DFS on a graph
for (char i = '0'; i <= '9'; i++)
{
RadixNode nextNode = curNode.getChildPhoneDir().get(i);
if (nextNode != null)
{
displayContactsUtil(nextNode, prefix + i);
}
}
}
public boolean displayContacts(String str, boolean isPhoneNumber)
{
RadixNode prevNode = root;
// 'flag' denotes whether the string entered
// so far is present in the Contact List
String prefix = "";
int len = str.length();
// Display the contact List for string formed
// after entering every character
int i;
for (i = 0; i < len; i++)
{
// 'str' stores the string entered so far
prefix += str.charAt(i);
// Get the last character entered
char lastChar = prefix.charAt(i);
// Find the Node corresponding to the last
// character of 'str' which is pointed by
// prevNode of the Trie
RadixNode curNode = prevNode.getChildPhoneDir().get(lastChar);
// If nothing found, then break the loop as
// no more prefixes are going to be present.
if (curNode == null)
{
System.out.println("No Results Found for \"" + prefix + "\"");
i++;
break;
}
// If present in trie then display all
// the contacts with given prefix.
System.out.println("Suggestions based on \"" + prefix + "\" are");
displayContactsUtil(curNode, prefix, isPhoneNumber);
// Change prevNode for next prefix
prevNode = curNode;
}
for ( ; i < len; i++)
{
prefix += str.charAt(i);
System.out.println("No Results Found for \"" + prefix + "\"");
}
return true;
}
}
我尝试在集合中搜索但卡住了。任何帮助/建议将不胜感激。
我建议您使用两种方法。
第一种方式:使用单个 trie。
可以在单个 trie 中存储您需要的所有内容。您的客户 class 很好,这是一个可能的 RadixNode
实现。
我认为不能有两个同名的客户,或者同一个 phone 号码。如果不是这种情况(例如,可能会有同名和不同 phone nb 的人)在评论中告诉我,我会编辑。
需要了解的重要一点是,如果您想要使用两种不同的方式来查找客户,并且您使用一个 trie,则每个客户将在您的 trie 中出现两次。一次在其名称对应的路径末尾,一次在其phone号对应的路径末尾后。
import java.util.HashMap;
import java.util.Map;
class RadixNode {
private Map<Character, RadixNode> children;
private Customer customer;
public RadixNode(){
this.children = new Map<Character, RadixNode>();
this.Customer = NULL;
}
Map<Character, RadixNode> getChildren() {
return children;
}
boolean hasCustomer() {
return this.customer != NULL;
}
Customer getCustomer() {
return customer;
}
void setCustomer(Customer customer) {
this.customer = customer;
}
}
如您所见,只有一张地图存储节点的子节点。那是因为我们可以看到一个 phone 数字作为一串数字,所以这个 trie 将存储所有的客户......两次。每个名字一次,每个 phone 个号码一次。
现在让我们看一个插入函数。你的 trie 将需要一个根,我们称它为 root
.
public void insert(RadixNode root, Customer customer){
insert_with_name(root, customer, 0);
insert_with_phone_nb(root, customer, 0);
}
public void insert_with_name(RadixNode node, Customer customer, int idx){
if (idx == customer.getName().length()){
node.setCustomer(customer);
} else {
Character current_char = customer.getName().chatAt(idx);
if (! node.getChlidren().containsKey(current_char){
RadixNode new_child = new RadixNode();
node.getChildren().put(current_char, new_child);
}
insert_with_name(node.getChildren().get(current_char), customer, idx+1);
}
}
insert_with_phone_nb()
方法类似。只要人们有唯一的名字、唯一的 phone 号码,并且某人的名字不能是某人的 phone 号码,这就会起作用。
如您所见,该方法是递归的。我建议您递归地构建您的 trie 结构(通常,一切都基于树结构),因为它使代码更简单、更简洁。
搜索功能几乎是插入功能的复制粘贴:
public void search_by_name(RadixNode node, String name, int idx){
// returns NULL if there is no user going by that name
if (idx == name.length()){
return node.getCustomer();
} else {
Character current_char = name.chatAt(idx);
if (! node.getChlidren().containsKey(current_char){
return NULL;
} else {
return search_by_name(node.getChildren().get(current_char), name, idx+1);
}
}
}
第二种方式:尝试2次
原理是一样的,你所要做的就是重用上面的代码,但保留两个不同的 root
节点,每个节点将构建一个 trie(一个用于名称,一个用于 phone 数字)。
唯一的区别是 insert
函数(因为它会用 2 个不同的根调用 insert_with_name
和 insert_with_phone_nb
),以及必须在正确的 trie 中搜索的搜索函数。
public void insert(RadixNode root_name_trie, RadixNode root_phone_trie, Customer customer){
insert_with_name(root_name_trie, customer, 0);
insert_with_phone_nb(root_phone_trie, customer, 0);
}
编辑: 评论精确后可能会有同名的客户,这里有一个替代实现,允许RadixNode
包含对多个[=的引用22=].
例如,将 RadixNode
中的 Customer customer
属性替换为 Vector<Customer>
。当然,这些方法必须相应地修改,然后按名称搜索 return 给你一个客户向量(可能是空的),因为这个搜索然后可以导致几个结果。
在您的情况下,我会选择一个包含客户向量的 trie。因此,您可以同时按名称和 phone(将数字转换为 String
)进行搜索,并维护一个数据结构。