在字符串列表中查找最长公共 Prefix/Suffix 的出现次数?

Find occurrence count of the longest common Prefix/Suffix in a List of Strings?

给定一个字符串列表:

ArrayList<String> strList = new ArrayList<String>();
strList.add("Mary had a little lamb named Willy");
strList.add("Mary had a little ham");
strList.add("Old McDonald had a farm named Willy");
strList.add("Willy had a little dog named ham");
strList.add("(abc)");
strList.add("(xyz)");
strList.add("Visit Target Store");
strList.add("Visit Walmart Store");

这应该以 HashMap<String, Integer> prefixMapsuffixMap:

的形式产生输出

前缀:

Mary had a -> 2
Mary had a little -> 2
( -> 2
Visit -> 2

后缀:

named Willy -> 2
ham -> 2
) -> 2
Store -> 2

到目前为止,我可以使用以下代码生成一个出现在列表中所有项目中的前缀:

public static final int INDEX_NOT_FOUND = -1;

public static String getAllCommonPrefixesInList(final String... strs) {
    if (strs == null || strs.length == 0) {
        return EMPTY_STRING;
    }
    
    
    final int smallestIndexOfDiff = getIndexOfDifference(strs);
    if (smallestIndexOfDiff == INDEX_NOT_FOUND) {
        
        // All Strings are identical
        if (strs[0] == null) {
            return EMPTY_STRING;
        }
        return strs[0];
    } else if (smallestIndexOfDiff == 0) {
        
        
        // No common initial characters found, return empty String
        return EMPTY_STRING;
    } else {
        
        // Common initial character sequence found, return sequence
        return strs[0].substring(0, smallestIndexOfDiff);
    }
}






public static int getIndexOfDifference(final CharSequence... charSequence) {
    if (charSequence == null || charSequence.length <= 1) {
        return INDEX_NOT_FOUND;
    }
    boolean isAnyStringNull = false;
    boolean areAllStringsNull = true;
    
    
    final int arrayLen = charSequence.length;
    int shortestStrLen = Integer.MAX_VALUE;
    int longestStrLen = 0;

    // Find the min and max string lengths - avoids having to check that we are not exceeding the length of the string each time through the bottom loop.
    for (int i = 0; i < arrayLen; i++) {
        if (charSequence[i] == null) {
            isAnyStringNull = true;
            shortestStrLen = 0;
        } else {
            areAllStringsNull = false;
            shortestStrLen = Math.min(charSequence[i].length(), shortestStrLen);
            longestStrLen = Math.max(charSequence[i].length(), longestStrLen);
        }
    }

    // Deals with lists containing all nulls or all empty strings
    
    if (areAllStringsNull || longestStrLen == 0 && !isAnyStringNull) {
        return INDEX_NOT_FOUND;
    }

    // Handle lists containing some nulls or some empty strings
    if (shortestStrLen == 0) {
        return 0;
    }

    // Find the position with the first difference across all strings
    int firstDiff = -1;
    for (int stringPos = 0; stringPos < shortestStrLen; stringPos++) {
        final char comparisonChar = charSequence[0].charAt(stringPos);
        for (int arrayPos = 1; arrayPos < arrayLen; arrayPos++) {
            if (charSequence[arrayPos].charAt(stringPos) != comparisonChar) {
                firstDiff = stringPos;
                break;
            }
        }
        if (firstDiff != -1) {
            break;
        }
    }

    if (firstDiff == -1 && shortestStrLen != longestStrLen) {
        
        // We compared all of the characters up to the length of the
        // shortest string and didn't find a match, but the string lengths
        // vary, so return the length of the shortest string.
        return shortestStrLen;
    }
    return firstDiff;
}

但是,我的目标是包含任何 prefix/suffix,结果地图中至少出现 2+ 次。

Java如何实现?

使用 trie.

应该很容易解决这个问题

trie 节点基本上应该跟踪两件事:

  1. 子节点
  2. 以当前节点结束的前缀数

在 trie 中插入所有字符串,这将在 O(string length * number of strings) 中完成。之后,只需遍历 trie,您就可以根据您的用例根据计数对前缀进行哈希处理。对于后缀,您可以使用相同的方法,只需开始以相反的顺序遍历字符串即可。

编辑:
转念一想,trie 可能是最有效的方法,但一个简单的 hashmap 实现也应该在这里起作用。下面是生成所有计数 > 1 的前缀的示例。

import java.util.*;
import java.util.stream.*;

class Main {
  public static void main(String[] args) {
    
    System.out.println("Hello world!");

    ArrayList<String> strList = new ArrayList<String>();
    
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("(abc)");
    strList.add("(xyz)");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    Map<String, Integer> prefixMap = new HashMap<String, Integer>();
    ArrayList<String> stringsWithHighestOccurrence = new ArrayList<String>();

    for (String word : strList) {
            for (int i = 1; i <= word.length(); i++){
        String prefix = word.substring(0, i);
        prefixMap.merge(prefix, 1, Integer::sum);
      }
        }

    Integer maxval = Collections.max(prefixMap.values());

    for (String key: prefixMap.keySet()){
      Integer value = prefixMap.get(key);
      if (value > 1) System.out.println(key + " : " + value);
      if (value == maxval) stringsWithHighestOccurrence.add(key);
    }

    int maxLength = stringsWithHighestOccurrence.stream().map(String::length).max(Integer::compareTo).get();
    
    System.out.println(maxLength);

    ArrayList<String> prefixesWithMaxLength =
stringsWithHighestOccurrence.stream().filter(c -> c.length() == maxLength).collect(Collectors.toCollection(ArrayList::new));
    System.out.println(prefixesWithMaxLength);
  }
}

尽管如此,为了完成,我还将添加一个基本的 TrieNode 实现,因为我的回答首先提出了这种方法。

TrieNode:

class TrieNode {
    private final Map<Character, TrieNode> children = new HashMap<>();
    private int count;

    Map<Character, TrieNode> getChildren() {
        return children;
    }

    boolean getCount() {
        return count;
    }

    void increaseCount() {
        this.count += 1;
    }
}

Trie:

class Trie {
    private TrieNode root;

    Trie() {
        root = new TrieNode();
    }

    void insert(String word) {
        TrieNode current = root;

        for (char l : word.toCharArray()) {
            current = current.getChildren().computeIfAbsent(l, c -> new TrieNode());
            current.increaseCount()
        }
    }
}

遍历 trie 类似于一个简单的 DFS 场景,其中还维护了到当前节点的“路径”(我们正在切换带有前缀的路径到这一点)。

我认为@Abhinav 提供的解决方案应该可以使用 HashMap。在这里,我将 post 使用 java 中的简单 Trie 实现的解决方案(进行一些自定义,例如将 freq 添加到 Trie 节点中)。

    ArrayList<String> strList = new ArrayList<String>();
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("( abc )");
    strList.add("( xyz )");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    TNode root = new TNode("");
    int maxFreq = 1;
    for(String sentence : strList) {
        TNode currentNode = root;
        String[] words = sentence.split(" "); // Assuming space character is the delimiter 
        for(String word: words) {
            if(currentNode.children.containsKey(word)) {
                currentNode.children.get(word).freq += 1;
                maxFreq = Math.max(maxFreq, currentNode.children.get(word).freq);
            } else {
                TNode c = new TNode(word);
                c.freq = 1;
                currentNode.children.put(word, c);
            }
            currentNode = currentNode.children.get(word);
        }
    }

    Map<String, Integer> result = new HashMap<String, Integer>();
    Queue<NodeWithPrefix> queue = new LinkedList<NodeWithPrefix>();
    for(TNode node : root.children.values()){
        NodeWithPrefix nwp = new NodeWithPrefix(node);
        nwp.prefix = "";
        queue.add(nwp);
    }
    while(!queue.isEmpty()) {
        NodeWithPrefix item = queue.poll();
        if(item.node.freq == maxFreq) {
            result.put(item.prefix + " " + item.node.value, item.node.freq);
        }
        for(TNode child : item.node.children.values()) {
            NodeWithPrefix nwp = new NodeWithPrefix(child);
            nwp.prefix = item.prefix + " " + item.node.value;
            queue.add(nwp);
        }
    }
    return result;

此算法需要另外 2 个 classes:

class NodeWithPrefix {
    String prefix;
    TNode node;
    public NodeWithPrefix(TNode node){
        this.node = node;
    }
}

class TNode {
    String value;
    int freq = 0;
    Map<String, TNode> children;
    public TNode(String value){
        this.value = value;
        children = new HashMap<String, TNode>();
    }
}

输出是for prefix:(for postfix应该类似,只需要向后构建Trie)

{ Mary had=2,  Mary had a=2,  Visit=2,  (=2,  Mary had a little=2,  Mary=2}

这里我使用 BFS 来检索 Trie 中频率等于 maxFreq 的所有子字符串。我们可以根据需要调整过滤条件。您也可以在这里进行 DFS。 其他考虑是我们可以将前缀添加到 TNode class 本身,我更喜欢将它单独放在另一个 class.

根据我对这个问题的理解,最适合解决它的数据结构是非循环不相交的 Graph.

在一般情况下,一个将由几个不相连的组成。每个都会有一个tree-like结构,在边缘情况下它会形成一个链表.

基本上,解决这个问题的最简单朴素方法是根据每一行创建一堆链表 ,并遍历它们。缺点是:重复节点(更大的内存消耗),更大的time-complexity(需要更多的操作)并且更多error-prone 因为需要更多的手动操作。

图表的描述

所以我会坚持使用 graph 作为这个问题的数据结构,并尽量让事情尽可能简单。

让我们考虑以下输入:

"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"

图形的图形表示将如下所示;

前两行将构成一个,由链表(头部)和组成tree(尾部)。第二个簇将由一个链表表示,它的顶点不与由其他字符串形成的顶点相连。

这不是构造顶点的唯一方式,头部可以产生一个 N-tree 并且可以在中间某处观察到链表。

要点是,为了解决问题,我们需要通过所有分支跟踪顶点链,直到 顶点重叠 。在 graph 的这些部分中,两条或多条线之间共有的每个 prefix-strings 和 suffix-string 将由 单个顶点表示 (节点).

为了维护映射到特定 顶点 的字符串的数量,每个顶点都应该有一个变量(int groupCount 在下面的代码中),当创建 顶点 时,它被分配默认值 1 并且每次新字符串映射到该顶点时递增。

每个 顶点 都包含一张地图,其中包含对其 邻居 的引用。添加新的 neighbour-vertex 时,根据给定的字符串创建新的 Vertexcount现有的 顶点 递增。

为了符合这个任务,图表应该保持对所有head-vertexes和[=89=的引用]tail-vertexes。为了简单起见,不是维护两组对相邻节点的引用,而是在每个顶点中维护两个单独的计数变量(因为suffix-count和prefix-count会不同),这个解决方案实际上由两个图组成(suffix-graphprefix-graph)。出于这个原因,实现 class 命名为 MultiGraph.

为了用顶点[=填充suffix-graphprefix-graph 153=],方法 addCluster() 通过 Iterator 以正常或相反的顺序遍历给定行的字符串,具体取决于正在填充哪个 graph .

深度优先搜索

图形填充后的下一步是生成频率为 2 和更大的字符串映射。

为此,正在使用 classical depth first search 算法。

为了实现 DFS,需要一个将用作 堆栈 的可变容器(ArrayDeque 正用于该目的)。从 heads/tailsmap 中取出的第一个元素将被放置在 stack 的顶部,并且 StringBuilder 持有这个元素的名称将被放置在地图中。

然后,要恢复具有特定 count 的字符串,将从 的顶部弹出 vertexes stack 和它们的 neighbors 和 count > 1 依次将被放置在堆栈的顶部。 current prefix 的副本加上分隔符和邻居的名字将被映射到 neighbour-vertex.

如果计数发生变化,则表示当前前缀表示至少两行之间的最长公共字符串。在这种情况下,前缀和计数被添加到生成的映射中。

实施

下面的实现由两个class组成,分别是narrow-focusedself-contained . MultiGraph class 专门用作数据结构,维护两个图。 pluming代码,像拆分字符串的行一样,被提取到一个单独的class GraphManager.

图表

public class MultiGraph {
    private final Map<String, Vertex> heads = new HashMap<>();
    private final Map<String, Vertex> tails = new HashMap<>();

    public void addCluster(Deque<String> names) {
        addCluster(heads, names.iterator());
        addCluster(tails, names.descendingIterator());
    }

    private void addCluster(Map<String, Vertex> clusters, Iterator<String> names) {
        String rootName = names.next();
        if (clusters.containsKey(rootName)) {
            clusters.get(rootName).incrementGroupCount();
        } else {
            clusters.put(rootName, new Vertex(rootName));
        }

        Vertex current = clusters.get(rootName);
        while (names.hasNext()) {
            current = current.addNext(names.next());
        }
    }

    public Map<String, Integer> generatePrefixMap(String delimiter) {
        Map<String, Integer> countByPrefix = new HashMap<>();

        for (Vertex next: heads.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(heads, countByPrefix, delimiter, next);
        }
        return countByPrefix;
    }

    public Map<String, Integer> generateSuffixMap(String delimiter) {
        Map<String, Integer> countBySuffix = new HashMap<>();

        for (Vertex next: tails.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(tails, countBySuffix, delimiter, next);
        }
        return countBySuffix;
    }
    // implementation of the Depth First Search algorithm
    public void performDFS(Map<String, Vertex> clusters,
                           Map<String, Integer> countByPrefix,
                           String delimiter, Vertex next) {

        StringBuilder prefix = null;
        Vertex current = next;
        int count = next.getGroupCount();

        Deque<Vertex> stack = new ArrayDeque<>(); // create as stack
        Map<Vertex, StringBuilder> prefixByVert = new HashMap<>();
        stack.push(next); // place the first element on the stack
        prefixByVert.put(current, new StringBuilder(current.getName()));

        while (!stack.isEmpty()) {
            current = stack.pop();
            if (current.getGroupCount() < count) { // the number of strings mapped to the current Vertex has been changed
                countByPrefix.put(prefix.toString(), count); // saving the result
                count = current.getGroupCount();
            }
            prefix = prefixByVert.get(current);

            for (Vertex neighbour: current.getNextVertByVal().values()) {
                if (next.getGroupCount() == 1) {
                    continue;
                }
                stack.push(neighbour);
                prefixByVert.put(neighbour, new StringBuilder(prefix)
                                    .append(delimiter)
                                    .append(neighbour.getName()));
            }
        }

        if (prefix != null && count > 1) {
            countByPrefix.putIfAbsent(prefix.toString(), count);
        }
    }

    private static class Vertex {
        private final String name;
        private int groupCount = 1;
        private final Map<String, Vertex> nextVertByVal = new HashMap<>();

        public Vertex(String name) {
            this.name = name;
        }

        public Vertex addNext(String value) {
            if (nextVertByVal.containsKey(value)) {
                nextVertByVal.get(value).incrementGroupCount();
            } else {
                nextVertByVal.put(value, new Vertex(value));
            }
            return nextVertByVal.get(value);
        }

        public void incrementGroupCount() {
            this.groupCount++;
        }

        public String getName() {
            return name;
        }

        public int getGroupCount() {
            return groupCount;
        }

        public Map<String, Vertex> getNextVertByVal() {
            return nextVertByVal;
        }
    }
}

以下 class 处理输入数据的处理任务:它拆分行,负责丢弃可能发生的空字符串,并将输入打包到 Deque以方便的方式适应双向迭代。

它还会实例化 graph 并管理它的工作。 GraphManager 负责向图形提供 分隔符 以便在创建结果映射时恢复字符串的初始形状。有了它,您可以在白色 space 上拆分给定的行,通过空字符串逐个字符或标点符号处理行,而无需更改这两个 classes.

中的一行

图形管理器

public class GraphManager {
    private MultiGraph graph = new MultiGraph();
    private String delimiter;

    private GraphManager(String delimiter) {
        this.delimiter = delimiter;
    }

    public static GraphManager getInstance(Iterable<String> lines, String delimiter) {
        GraphManager gm = new GraphManager(delimiter);
        gm.init(lines);
        return gm;
    }

    private void init(Iterable<String> lines) {
        for (String line: lines) {
            Deque<String> names = new ArrayDeque<>();
            for (String name: line.split(delimiter)) {
                if (!name.isEmpty()) {
                    names.add(name);
                }
            }
            addCluster(names);
        }
    }

    private void addCluster(Deque<String> names) {
        graph.addCluster(names);
    }

    public Map<String, Integer> getPrefixMap() {
        return graph.generatePrefixMap(delimiter);
    }

    public Map<String, Integer> getSuffixMap() {
        return graph.generateSuffixMap(delimiter);
    }
}

main()

public static void main(String[] args) {
    List<String> lines = List.of(
            "Mary had a little lamb named Willy", "Mary had a little ham",
            "Old McDonald had a farm named Willy", "Willy had a little dog named ham",
            "( abc )", "( xyz )", "Visit Target Store", "Visit Walmart Store");

    GraphManager gm = GraphManager.getInstance(lines, " ");
    
    System.out.println("Prefixes:");
    for (Map.Entry<String, Integer> entry: gm.getPrefixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }

    System.out.println("\nSuffixes:");
    for (Map.Entry<String, Integer> entry: gm.getSuffixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }
}

输出

Prefixes:
2 Mary had a little
2 Visit
2 (

Suffixes:
2 ham
2 )
2 Store
2 Willy named

要实现一个trie,首先你需要一个节点。

class Node<T> {

    private final T value;
    private final Node<T> parent;
    private final Map<T, Node<T>> children;
    private boolean isEnd;

    Node(T value, Node<T> parent) {
        this.value = value;
        this.parent = parent;
        this.children = new HashMap<>();
        this.isEnd = false;
    }

    Node<T> addChild(T childValue, Node<T> parent) {
        //return child node if existing, otherwise create and return
       return this.children.computeIfAbsent(childValue, value -> new Node<>(value, parent));
    }

    T getValue() {
        return this.value;
    }

    Node<T> getParent() {
        return this.parent;
    }

    boolean isEnd() {
        return this.isEnd;
    }

    void setEnd(boolean isEnd) {
        this.isEnd = isEnd;
    }

    Collection<Node<T>> children() {
        return this.children.values();
    }

    @Override
    public String toString() {
        //for easier debugging
        return "Node{" +
                "value=" + this.value +
                ", children=" + this.children.keySet() +
                ", isEnd=" + this.isEnd +
                '}';
    }
}

在节点中我们有实际值,对父节点的引用,以便于构建前缀,子节点将值映射到相应的节点,如果节点是结束节点,则有一个标志。

实际Trie实施

public class Trie<T> {

    private final Node<T> root;

    public Trie() {
        this.root = new Node<>(null, null);
    }

    public void insert(T[] elements) {
        if (elements.length == 0) {
            //don't want to set root as end node
            return;
        }
        Node<T> currentNode = this.root;
        for (T element : elements) {
            currentNode = currentNode.addChild(element, currentNode);
        }
        currentNode.setEnd(true);
    }

    public Map<Collection<T>, Integer> countPrefixes(BiConsumer<Deque<T>, T> operation) {
        Map<Collection<T>, Integer> map = new LinkedHashMap<>();
        this.countPrefixes(this.root, map, operation);
        return map;
    }

    private void countPrefixes(Node<T> current, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (current != this.root) {
            //check java doc for AbstractSet hashCode and equals
            ArrayList<T> prefix = this.buildKey(current, operation);
            int childrenCount = current.children().size();
            if (childrenCount == 0 && current.isEnd()) {
                //this sets entire collection(entire sentence 'Mary had a little lamb named Willy' for example)
                //as prefix which is met once
                childrenCount = 1;
            }
            map.merge(prefix, childrenCount, Integer::sum);
            if (childrenCount > 1) {
                //each parent node is already marked as prefix met once,
                //but any node having more than one child means entire chain of nodes
                //is a prefix met the number of children,
                //so we go backwards to update parent counts with the difference
                this.updateParentPrefixes(current.getParent(), childrenCount - 1, map, operation);
            }
        }
        for (Node<T> child : current.children()) {
            //visit each child recursively to count them
            //depth first
            this.countPrefixes(child, map, operation);
        }
    }

    //operation is abstraction for the order
    //in which we want to add elements
    //when building key/prefix
    private ArrayList<T> buildKey(Node<T> node, BiConsumer<Deque<T>, T> operation) {
        Deque<T> prefix = new LinkedList<>();
        while (node.getValue() != null) {
            operation.accept(prefix, node.getValue());
            node = node.getParent();
        }
        return new ArrayList<>(prefix);
    }

    private void updateParentPrefixes(Node<T> parent, int updateCount, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (parent == this.root) {
            //we don't want to update root, ever!!!
            return;
        }
        ArrayList<T> prefix = this.buildKey(parent, operation);
        map.merge(prefix, updateCount, Integer::sum);
        //visit parents recursively until root
        this.updateParentPrefixes(parent.getParent(), updateCount, map, operation);
    }
}

在某种意义上简化了一些,只实现了insert,但我们现在不需要update和delete。需要注意的几点:

  1. Trie 必须有一个 root 节点,它是空的。我们用 null 值和 null 父级表示。我们必须永不更新此节点。
  2. countPrefixes() 的参数 BiConsumer<Deque<T>, T> operation。这是我们在创建前缀时想要添加元素的顺序的抽象。它是必需的,因为在反转 collection/sentence.
  3. 时,计数后缀可以表示并实现为计数前缀
  4. Return 输入 Map<Collection<T>, Integer>,共 countPrefixes()。此实现更通用,因此每个前缀都表示为 节点值集合 .
  5. 为什么前缀是ArrayList?首先我们需要保持插入顺序。其次,它对 hashCode()equals() 的实现使其成为映射键的良好候选者。引用 javadoc:

This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode. - 对于 hashCode。

In other words, two lists are defined to be equal if they contain the same elements in the same order. - 等于。

  1. 代码中的注释应解释实施中的其余细节。

要测试的主线。

public class TrieMain {

    public static void main(String[] args) {
        String spacePattern = "\s+";
        String emptyStringPattern = "";

        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String pattern = string.startsWith("(") ? emptyStringPattern : spacePattern;
            String[] words = string.split(pattern);
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

这里需要注意的事项:

  1. 在列表中添加了更多句子以进行额外测试。
  2. 我将 ( 的输入拆分为字符而不是单词作为其余部分以实现您的输出,但示例有些混乱。对于句子前缀是单词的集合,而对于这两个前缀是字符。
  3. 后缀需要第二个 trie - 否则我们可能会将前缀算作后缀,反之亦然。
  4. Arrays.asList() 返回的列表由输入数组支持。我正在利用它来反转输入数组本身的后缀。

编辑: 虽然使用实现的字符进行测试 AbstractSet,或者准确地说是任何集合,对于前缀来说是个坏主意,因为它不允许重复的元素。更新了 trie 实现以使用 ArrayList。我还添加了 2 个主要方法,以演示 trie 使用字符串和字符:

  1. Strings - 每个节点是一个词,通过space
  2. 拆分字符串得到
public class TrieWords {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String[] words = string.split("\s+");
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}
  1. 字符 - 每个节点都是字符串
  2. 中的一个字符
public class TrieChars {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abcd)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<Character> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<Character> reversedTrie = new Trie<>();
        for (String string : strList) {
            int length = string.length();
            Character[] words = new Character[length];
            Character[] reversed = new Character[length];
            for (int i = 0; i < length; i++) {
                int reversedIndex = length - 1 - i;
                words[i] = string.charAt(i);
                reversed[reversedIndex] = string.charAt(i);
            }
            trie.insert(words);
            reversedTrie.insert(reversed);
        }
        Map<Collection<Character>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<Character>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<Character>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = e.getKey().stream().map(Object::toString).collect(Collectors.joining(""));
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}