Java 中用于客户搜索的 Radix(Trie) 树实现

Radix(Trie) Tree implementation for Cutomer search in Java

我正在做一个项目,需要在数百万客户的数据中进行搜索。我想实现基数(特里)搜索算法。我已经阅读并实现了一个简单的字符串集合的基数。但是这里我有一个客户集合,想按姓名或手机号码搜索。

客户Class:

public class Customer {

    String name;
    String mobileNumer;


    public Customer (String name, String phoneNumer) {
        this.name = name;
        this.mobileNumer = phoneNumer;
    }

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getPhoneNumer() {
        return mobileNumer;
    }
    public void setPhoneNumer(String phoneNumer) {
        this.mobileNumer = phoneNumer;
    }



}

RadixNode Class:

import java.util.HashMap;
import java.util.Map;

class RadixNode {
    private final Map<Character, RadixNode> child = new HashMap<>();
    private final Map<Customer, RadixNode> mobileNum = new HashMap<>();
    private boolean endOfWord;

    Map<Character, RadixNode> getChild() {
        return child;
    }

    Map<Customer, RadixNode> getChildPhoneDir() {
        return mobileNum;
    }

    boolean isEndOfWord() {
        return endOfWord;
    }

    void setEndOfWord(boolean endOfWord) {
        this.endOfWord = endOfWord;
    }
}

基数Class:

class Radix {
    private RadixNode root;

    Radix() {
        root = new RadixNode();
    }

    void insert(String word) {
        RadixNode current = root;

        for (int i = 0; i < word.length(); i++) {
            current = current.getChild().computeIfAbsent(word.charAt(i), c -> new RadixNode());
        }
        current.setEndOfWord(true);
    }

    void insert(Customer word) {
        RadixNode current = root;
        System.out.println("==========================================");
        System.out.println(word.mobileNumer.length());
        for (int i = 0; i < word.mobileNumer.length(); i++) {
            current = current.getChildPhoneDir().computeIfAbsent(word.mobileNumer.charAt(i), c -> new RadixNode());
            System.out.println(current);
        }
        current.setEndOfWord(true);
    }

    boolean delete(String word) {
        return delete(root, word, 0);
    }

    boolean containsNode(String word) {
        RadixNode current = root;

        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            RadixNode node = current.getChild().get(ch);
            if (node == null) {
                return false;
            }
            current = node;
        }
        return current.isEndOfWord();
    }

    boolean isEmpty() {
        return root == null;
    }

    private boolean delete(RadixNode current, String word, int index) {
        if (index == word.length()) {
            if (!current.isEndOfWord()) {
                return false;
            }
            current.setEndOfWord(false);
            return current.getChild().isEmpty();
        }
        char ch = word.charAt(index);
        RadixNode node = current.getChild().get(ch);
        if (node == null) {
            return false;
        }
        boolean shouldDeleteCurrentNode = delete(node, word, index + 1) && !node.isEndOfWord();

        if (shouldDeleteCurrentNode) {
            current.getChild().remove(ch);
            return current.getChild().isEmpty();
        }
        return false;
    }

    public void displayContactsUtil(RadixNode curNode, String prefix) 
    { 

        // Check if the string 'prefix' ends at this Node 
        // If yes then display the string found so far 
        if (curNode.isEndOfWord()) 
            System.out.println(prefix); 

        // Find all the adjacent Nodes to the current 
        // Node and then call the function recursively 
        // This is similar to performing DFS on a graph 
        for (char i = 'a'; i <= 'z'; i++) 
        { 
            RadixNode nextNode = curNode.getChild().get(i); 
            if (nextNode != null) 
            { 
                    displayContactsUtil(nextNode, prefix + i); 
            } 
        } 
    }


    public boolean displayContacts(String str) 
    { 
        RadixNode prevNode = root; 

        // 'flag' denotes whether the string entered 
        // so far is present in the Contact List 

        String prefix = ""; 
        int len = str.length(); 

        // Display the contact List for string formed 
        // after entering every character 
        int i; 
        for (i = 0; i < len; i++) 
        { 
            // 'str' stores the string entered so far 
            prefix += str.charAt(i); 

            // Get the last character entered 
            char lastChar = prefix.charAt(i); 

            // Find the Node corresponding to the last 
            // character of 'str' which is pointed by 
            // prevNode of the Trie 
            RadixNode curNode = prevNode.getChild().get(lastChar); 

            // If nothing found, then break the loop as 
            // no more prefixes are going to be present. 
            if (curNode == null) 
            { 
                System.out.println("No Results Found for \"" + prefix + "\""); 
                i++; 
                break; 
            } 

            // If present in trie then display all 
            // the contacts with given prefix. 
            System.out.println("Suggestions based on \"" + prefix + "\" are"); 
            displayContactsUtil(curNode, prefix); 

            // Change prevNode for next prefix 
            prevNode = curNode; 
        } 

        for ( ; i < len; i++) 
        { 
            prefix += str.charAt(i); 
            System.out.println("No Results Found for \""  + prefix + "\""); 
        }

        return true;
    }


    public void displayContactsUtil(RadixNode curNode, String prefix, boolean isPhoneNumber) 
    { 

        // Check if the string 'prefix' ends at this Node 
        // If yes then display the string found so far 
        if (curNode.isEndOfWord()) 
            System.out.println(prefix); 

        // Find all the adjacent Nodes to the current 
        // Node and then call the function recursively 
        // This is similar to performing DFS on a graph 
        for (char i = '0'; i <= '9'; i++) 
        { 
            RadixNode nextNode = curNode.getChildPhoneDir().get(i); 
            if (nextNode != null) 
            { 
                    displayContactsUtil(nextNode, prefix + i); 
            } 
        } 
    }

    public boolean displayContacts(String str, boolean isPhoneNumber) 
    { 
        RadixNode prevNode = root; 

        // 'flag' denotes whether the string entered 
        // so far is present in the Contact List 

        String prefix = ""; 
        int len = str.length(); 

        // Display the contact List for string formed 
        // after entering every character 
        int i; 
        for (i = 0; i < len; i++) 
        { 
            // 'str' stores the string entered so far 
            prefix += str.charAt(i); 

            // Get the last character entered 
            char lastChar = prefix.charAt(i); 

            // Find the Node corresponding to the last 
            // character of 'str' which is pointed by 
            // prevNode of the Trie 
            RadixNode curNode = prevNode.getChildPhoneDir().get(lastChar); 

            // If nothing found, then break the loop as 
            // no more prefixes are going to be present. 
            if (curNode == null) 
            { 
                System.out.println("No Results Found for \"" + prefix + "\""); 
                i++; 
                break; 
            } 

            // If present in trie then display all 
            // the contacts with given prefix. 
            System.out.println("Suggestions based on \"" + prefix + "\" are"); 
            displayContactsUtil(curNode, prefix, isPhoneNumber); 

            // Change prevNode for next prefix 
            prevNode = curNode; 
        } 

        for ( ; i < len; i++) 
        { 
            prefix += str.charAt(i); 
            System.out.println("No Results Found for \""  + prefix + "\""); 
        }

        return true;
    } 


}

我尝试在集合中搜索但卡住了。任何帮助/建议将不胜感激。

我建议您使用两种方法。

第一种方式:使用单个 trie。
可以在单个 trie 中存储您需要的所有内容。您的客户 class 很好,这是一个可能的 RadixNode 实现。
我认为不能有两个同名的客户,或者同一个 phone 号码。如果不是这种情况(例如,可能会有同名和不同 phone nb 的人)在评论中告诉我,我会编辑。
需要了解的重要一点是,如果您想要使用两种不同的方式来查找客户,并且您使用一个 trie,则每个客户将在您的 trie 中出现两次。一次在其名称对应的路径末尾,一次在其phone号对应的路径末尾后。

import java.util.HashMap;
import java.util.Map;

class RadixNode {
    private Map<Character, RadixNode> children;
    private Customer customer;

    public RadixNode(){
        this.children = new Map<Character, RadixNode>();
        this.Customer = NULL;
    }
    Map<Character, RadixNode> getChildren() {
        return children;
    }
    boolean hasCustomer() {
        return this.customer != NULL;
    }
    Customer getCustomer() {
        return customer;
    }
    void setCustomer(Customer customer) {
        this.customer = customer;
    }
}

如您所见,只有一张地图存储节点的子节点。那是因为我们可以看到一个 phone 数字作为一串数字,所以这个 trie 将存储所有的客户......两次。每个名字一次,每个 phone 个号码一次。
现在让我们看一个插入函数。你的 trie 将需要一个根,我们称它为 root.

public void insert(RadixNode root, Customer customer){
    insert_with_name(root, customer, 0);
    insert_with_phone_nb(root, customer, 0);
}

public void insert_with_name(RadixNode node, Customer customer, int idx){
    if (idx == customer.getName().length()){
        node.setCustomer(customer);
    } else {
        Character current_char = customer.getName().chatAt(idx);
        if (! node.getChlidren().containsKey(current_char){
            RadixNode new_child = new RadixNode();
            node.getChildren().put(current_char, new_child);
        }
        insert_with_name(node.getChildren().get(current_char), customer, idx+1);
    }
}

insert_with_phone_nb()方法类似。只要人们有唯一的名字、唯一的 phone 号码,并且某人的名字不能是某人的 phone 号码,这就会起作用。
如您所见,该方法是递归的。我建议您递归地构建您的 trie 结构(通常,一切都基于树结构),因为它使代码更简单、更简洁。
搜索功能几乎是插入功能的复制粘贴:

public void search_by_name(RadixNode node, String name, int idx){
    // returns NULL if there is no user going by that name
    if (idx == name.length()){
        return node.getCustomer();
    } else {
        Character current_char =  name.chatAt(idx);
        if (! node.getChlidren().containsKey(current_char){
            return NULL;
        } else {
            return search_by_name(node.getChildren().get(current_char), name, idx+1);
        }
    }
}

第二种方式:尝试2次
原理是一样的,你所要做的就是重用上面的代码,但保留两个不同的 root 节点,每个节点将构建一个 trie(一个用于名称,一个用于 phone 数字)。 唯一的区别是 insert 函数(因为它会用 2 个不同的根调用 insert_with_nameinsert_with_phone_nb),以及必须在正确的 trie 中搜索的搜索函数。

public void insert(RadixNode root_name_trie, RadixNode root_phone_trie, Customer customer){
    insert_with_name(root_name_trie, customer, 0);
    insert_with_phone_nb(root_phone_trie, customer, 0);
}

编辑: 评论精确后可能会有同名的客户,这里有一个替代实现,允许RadixNode包含对多个[=的引用22=].
例如,将 RadixNode 中的 Customer customer 属性替换为 Vector<Customer>。当然,这些方法必须相应地修改,然后按名称搜索 return 给你一个客户向量(可能是空的),因为这个搜索然后可以导致几个结果。
在您的情况下,我会选择一个包含客户向量的 trie。因此,您可以同时按名称和 phone(将数字转换为 String)进行搜索,并维护一个数据结构。