在字符串列表中查找最长公共 Prefix/Suffix 的出现次数？

Question

给定一个字符串列表：

ArrayList<String> strList = new ArrayList<String>();
strList.add("Mary had a little lamb named Willy");
strList.add("Mary had a little ham");
strList.add("Old McDonald had a farm named Willy");
strList.add("Willy had a little dog named ham");
strList.add("(abc)");
strList.add("(xyz)");
strList.add("Visit Target Store");
strList.add("Visit Walmart Store");

这应该以 HashMap<String, Integer> prefixMap 和 suffixMap:

的形式产生输出

前缀:

Mary had a -> 2
Mary had a little -> 2
( -> 2
Visit -> 2

后缀:

named Willy -> 2
ham -> 2
) -> 2
Store -> 2

到目前为止，我可以使用以下代码生成一个出现在列表中所有项目中的前缀：

public static final int INDEX_NOT_FOUND = -1;

public static String getAllCommonPrefixesInList(final String... strs) {
    if (strs == null || strs.length == 0) {
        return EMPTY_STRING;
    }
    
    
    final int smallestIndexOfDiff = getIndexOfDifference(strs);
    if (smallestIndexOfDiff == INDEX_NOT_FOUND) {
        
        // All Strings are identical
        if (strs[0] == null) {
            return EMPTY_STRING;
        }
        return strs[0];
    } else if (smallestIndexOfDiff == 0) {
        
        
        // No common initial characters found, return empty String
        return EMPTY_STRING;
    } else {
        
        // Common initial character sequence found, return sequence
        return strs[0].substring(0, smallestIndexOfDiff);
    }
}






public static int getIndexOfDifference(final CharSequence... charSequence) {
    if (charSequence == null || charSequence.length <= 1) {
        return INDEX_NOT_FOUND;
    }
    boolean isAnyStringNull = false;
    boolean areAllStringsNull = true;
    
    
    final int arrayLen = charSequence.length;
    int shortestStrLen = Integer.MAX_VALUE;
    int longestStrLen = 0;

    // Find the min and max string lengths - avoids having to check that we are not exceeding the length of the string each time through the bottom loop.
    for (int i = 0; i < arrayLen; i++) {
        if (charSequence[i] == null) {
            isAnyStringNull = true;
            shortestStrLen = 0;
        } else {
            areAllStringsNull = false;
            shortestStrLen = Math.min(charSequence[i].length(), shortestStrLen);
            longestStrLen = Math.max(charSequence[i].length(), longestStrLen);
        }
    }

    // Deals with lists containing all nulls or all empty strings
    
    if (areAllStringsNull || longestStrLen == 0 && !isAnyStringNull) {
        return INDEX_NOT_FOUND;
    }

    // Handle lists containing some nulls or some empty strings
    if (shortestStrLen == 0) {
        return 0;
    }

    // Find the position with the first difference across all strings
    int firstDiff = -1;
    for (int stringPos = 0; stringPos < shortestStrLen; stringPos++) {
        final char comparisonChar = charSequence[0].charAt(stringPos);
        for (int arrayPos = 1; arrayPos < arrayLen; arrayPos++) {
            if (charSequence[arrayPos].charAt(stringPos) != comparisonChar) {
                firstDiff = stringPos;
                break;
            }
        }
        if (firstDiff != -1) {
            break;
        }
    }

    if (firstDiff == -1 && shortestStrLen != longestStrLen) {
        
        // We compared all of the characters up to the length of the
        // shortest string and didn't find a match, but the string lengths
        // vary, so return the length of the shortest string.
        return shortestStrLen;
    }
    return firstDiff;
}

但是，我的目标是包含任何 prefix/suffix，结果地图中至少出现 2+ 次。

Java如何实现？

Answer 1

使用 trie.

应该很容易解决这个问题

trie 节点基本上应该跟踪两件事：

子节点
以当前节点结束的前缀数

在 trie 中插入所有字符串，这将在 O(string length * number of strings) 中完成。之后，只需遍历 trie，您就可以根据您的用例根据计数对前缀进行哈希处理。对于后缀，您可以使用相同的方法，只需开始以相反的顺序遍历字符串即可。

编辑：
转念一想，trie 可能是最有效的方法，但一个简单的 hashmap 实现也应该在这里起作用。下面是生成所有计数 > 1 的前缀的示例。

import java.util.*;
import java.util.stream.*;

class Main {
  public static void main(String[] args) {
    
    System.out.println("Hello world!");

    ArrayList<String> strList = new ArrayList<String>();
    
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("(abc)");
    strList.add("(xyz)");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    Map<String, Integer> prefixMap = new HashMap<String, Integer>();
    ArrayList<String> stringsWithHighestOccurrence = new ArrayList<String>();

    for (String word : strList) {
            for (int i = 1; i <= word.length(); i++){
        String prefix = word.substring(0, i);
        prefixMap.merge(prefix, 1, Integer::sum);
      }
        }

    Integer maxval = Collections.max(prefixMap.values());

    for (String key: prefixMap.keySet()){
      Integer value = prefixMap.get(key);
      if (value > 1) System.out.println(key + " : " + value);
      if (value == maxval) stringsWithHighestOccurrence.add(key);
    }

    int maxLength = stringsWithHighestOccurrence.stream().map(String::length).max(Integer::compareTo).get();
    
    System.out.println(maxLength);

    ArrayList<String> prefixesWithMaxLength =
stringsWithHighestOccurrence.stream().filter(c -> c.length() == maxLength).collect(Collectors.toCollection(ArrayList::new));
    System.out.println(prefixesWithMaxLength);
  }
}

尽管如此，为了完成，我还将添加一个基本的 TrieNode 实现，因为我的回答首先提出了这种方法。

TrieNode:

class TrieNode {
    private final Map<Character, TrieNode> children = new HashMap<>();
    private int count;

    Map<Character, TrieNode> getChildren() {
        return children;
    }

    boolean getCount() {
        return count;
    }

    void increaseCount() {
        this.count += 1;
    }
}

Trie:

class Trie {
    private TrieNode root;

    Trie() {
        root = new TrieNode();
    }

    void insert(String word) {
        TrieNode current = root;

        for (char l : word.toCharArray()) {
            current = current.getChildren().computeIfAbsent(l, c -> new TrieNode());
            current.increaseCount()
        }
    }
}

遍历 trie 类似于一个简单的 DFS 场景，其中还维护了到当前节点的“路径”（我们正在切换带有前缀的路径到这一点）。

Answer 2

我认为@Abhinav 提供的解决方案应该可以使用 HashMap。在这里，我将 post 使用 java 中的简单 Trie 实现的解决方案（进行一些自定义，例如将 freq 添加到 Trie 节点中）。

    ArrayList<String> strList = new ArrayList<String>();
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("( abc )");
    strList.add("( xyz )");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    TNode root = new TNode("");
    int maxFreq = 1;
    for(String sentence : strList) {
        TNode currentNode = root;
        String[] words = sentence.split(" "); // Assuming space character is the delimiter 
        for(String word: words) {
            if(currentNode.children.containsKey(word)) {
                currentNode.children.get(word).freq += 1;
                maxFreq = Math.max(maxFreq, currentNode.children.get(word).freq);
            } else {
                TNode c = new TNode(word);
                c.freq = 1;
                currentNode.children.put(word, c);
            }
            currentNode = currentNode.children.get(word);
        }
    }

    Map<String, Integer> result = new HashMap<String, Integer>();
    Queue<NodeWithPrefix> queue = new LinkedList<NodeWithPrefix>();
    for(TNode node : root.children.values()){
        NodeWithPrefix nwp = new NodeWithPrefix(node);
        nwp.prefix = "";
        queue.add(nwp);
    }
    while(!queue.isEmpty()) {
        NodeWithPrefix item = queue.poll();
        if(item.node.freq == maxFreq) {
            result.put(item.prefix + " " + item.node.value, item.node.freq);
        }
        for(TNode child : item.node.children.values()) {
            NodeWithPrefix nwp = new NodeWithPrefix(child);
            nwp.prefix = item.prefix + " " + item.node.value;
            queue.add(nwp);
        }
    }
    return result;

此算法需要另外 2 个 classes：

class NodeWithPrefix {
    String prefix;
    TNode node;
    public NodeWithPrefix(TNode node){
        this.node = node;
    }
}

class TNode {
    String value;
    int freq = 0;
    Map<String, TNode> children;
    public TNode(String value){
        this.value = value;
        children = new HashMap<String, TNode>();
    }
}

输出是for prefix：（for postfix应该类似，只需要向后构建Trie）

{ Mary had=2,  Mary had a=2,  Visit=2,  (=2,  Mary had a little=2,  Mary=2}

这里我使用 BFS 来检索 Trie 中频率等于 maxFreq 的所有子字符串。我们可以根据需要调整过滤条件。您也可以在这里进行 DFS。其他考虑是我们可以将前缀添加到 TNode class 本身，我更喜欢将它单独放在另一个 class.

中

Answer 3

根据我对这个问题的理解，最适合解决它的数据结构是非循环不相交的 Graph.

在一般情况下，一个图将由几个不相连的簇组成。每个簇都会有一个tree-like结构，在边缘情况下它会形成一个链表.

基本上，解决这个问题的最简单朴素方法是根据每一行创建一堆链表，并遍历它们。缺点是：重复节点（更大的内存消耗），更大的time-complexity（需要更多的操作）并且更多error-prone 因为需要更多的手动操作。

图表的描述

所以我会坚持使用 graph 作为这个问题的数据结构，并尽量让事情尽可能简单。

让我们考虑以下输入：

"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"

图形的图形表示将如下所示；

前两行将构成一个簇，由链表（头部）和组成tree（尾部）。第二个簇将由一个链表表示，它的顶点不与由其他字符串形成的顶点相连。

这不是构造顶点的唯一方式，头部可以产生一个 N-tree 并且可以在中间某处观察到链表。

要点是，为了解决问题，我们需要通过所有分支跟踪顶点链，直到 顶点重叠 。在 graph 的这些部分中，两条或多条线之间共有的每个 prefix-strings 和 suffix-string 将由 单个顶点表示 (节点).

为了维护映射到特定顶点的字符串的数量，每个顶点都应该有一个变量（int groupCount 在下面的代码中)，当创建顶点时，它被分配默认值 1 并且每次新字符串映射到该顶点时递增。

每个顶点都包含一张地图，其中包含对其邻居的引用。添加新的 neighbour-vertex 时，根据给定的字符串创建新的 Vertex 或 count现有的顶点递增。

为了符合这个任务，图表应该保持对所有head-vertexes和[=89=的引用]tail-vertexes。为了简单起见，不是维护两组对相邻节点的引用，而是在每个顶点中维护两个单独的计数变量（因为suffix-count和prefix-count会不同），这个解决方案图实际上由两个图组成（suffix-graph和prefix-graph)。出于这个原因，实现 class 命名为 MultiGraph.

为了用顶点[=填充suffix-graph和prefix-graph 153=]，方法 addCluster() 通过 Iterator 以正常或相反的顺序遍历给定行的字符串，具体取决于正在填充哪个 graph .

深度优先搜索

图形填充后的下一步是生成频率为 2 和更大的字符串映射。

为此，正在使用 classical depth first search 算法。

为了实现 DFS，需要一个将用作堆栈的可变容器（ArrayDeque 正用于该目的）。从 heads/tails 的 map 中取出的第一个元素将被放置在 stack 的顶部，并且 StringBuilder 持有这个元素的名称将被放置在地图中。

然后，要恢复具有特定 count 的字符串，将从 的顶部弹出 vertexes stack 和它们的 neighbors 和 count > 1 依次将被放置在堆栈的顶部。 current prefix 的副本加上分隔符和邻居的名字将被映射到 neighbour-vertex.

如果计数发生变化，则表示当前前缀表示至少两行之间的最长公共字符串。在这种情况下，前缀和计数被添加到生成的映射中。

实施

下面的实现由两个class组成，分别是narrow-focused和self-contained . MultiGraph class 专门用作数据结构，维护两个图。 pluming代码，像拆分字符串的行一样，被提取到一个单独的class GraphManager.

图表

public class MultiGraph { private final Map<String, Vertex> heads = new HashMap<>(); private final Map<String, Vertex> tails = new HashMap<>(); public void addCluster(Deque<String> names) { addCluster(heads, names.iterator()); addCluster(tails, names.descendingIterator()); } private void addCluster(Map<String, Vertex> clusters, Iterator<String> names) { String rootName = names.next(); if (clusters.containsKey(rootName)) { clusters.get(rootName).incrementGroupCount(); } else { clusters.put(rootName, new Vertex(rootName)); } Vertex current = clusters.get(rootName); while (names.hasNext()) { current = current.addNext(names.next()); } } public Map<String, Integer> generatePrefixMap(String delimiter) { Map<String, Integer> countByPrefix = new HashMap<>(); for (Vertex next: heads.values()) { if (next.getGroupCount() == 1) { continue; } performDFS(heads, countByPrefix, delimiter, next); } return countByPrefix; } public Map<String, Integer> generateSuffixMap(String delimiter) { Map<String, Integer> countBySuffix = new HashMap<>(); for (Vertex next: tails.values()) { if (next.getGroupCount() == 1) { continue; } performDFS(tails, countBySuffix, delimiter, next); } return countBySuffix; } // implementation of the Depth First Search algorithm public void performDFS(Map<String, Vertex> clusters, Map<String, Integer> countByPrefix, String delimiter, Vertex next) { StringBuilder prefix = null; Vertex current = next; int count = next.getGroupCount(); Deque<Vertex> stack = new ArrayDeque<>(); // create as stack Map<Vertex, StringBuilder> prefixByVert = new HashMap<>(); stack.push(next); // place the first element on the stack prefixByVert.put(current, new StringBuilder(current.getName())); while (!stack.isEmpty()) { current = stack.pop(); if (current.getGroupCount() < count) { // the number of strings mapped to the current Vertex has been changed countByPrefix.put(prefix.toString(), count); // saving the result count = current.getGroupCount(); } prefix = prefixByVert.get(current); for (Vertex neighbour: current.getNextVertByVal().values()) { if (next.getGroupCount() == 1) { continue; } stack.push(neighbour); prefixByVert.put(neighbour, new StringBuilder(prefix) .append(delimiter) .append(neighbour.getName())); } } if (prefix != null && count > 1) { countByPrefix.putIfAbsent(prefix.toString(), count); } } private static class Vertex { private final String name; private int groupCount = 1; private final Map<String, Vertex> nextVertByVal = new HashMap<>(); public Vertex(String name) { this.name = name; } public Vertex addNext(String value) { if (nextVertByVal.containsKey(value)) { nextVertByVal.get(value).incrementGroupCount(); } else { nextVertByVal.put(value, new Vertex(value)); } return nextVertByVal.get(value); } public void incrementGroupCount() { this.groupCount++; } public String getName() { return name; } public int getGroupCount() { return groupCount; } public Map<String, Vertex> getNextVertByVal() { return nextVertByVal; } } }
以下 class 处理输入数据的处理任务：它拆分行，负责丢弃可能发生的空字符串，并将输入打包到 Deque以方便的方式适应双向迭代。

它还会实例化 graph 并管理它的工作。 GraphManager 负责向图形提供 分隔符 以便在创建结果映射时恢复字符串的初始形状。有了它，您可以在白色 space 上拆分给定的行，通过空字符串逐个字符或标点符号处理行，而无需更改这两个 classes.
中的一行
图形管理器

public class GraphManager { private MultiGraph graph = new MultiGraph(); private String delimiter; private GraphManager(String delimiter) { this.delimiter = delimiter; } public static GraphManager getInstance(Iterable<String> lines, String delimiter) { GraphManager gm = new GraphManager(delimiter); gm.init(lines); return gm; } private void init(Iterable<String> lines) { for (String line: lines) { Deque<String> names = new ArrayDeque<>(); for (String name: line.split(delimiter)) { if (!name.isEmpty()) { names.add(name); } } addCluster(names); } } private void addCluster(Deque<String> names) { graph.addCluster(names); } public Map<String, Integer> getPrefixMap() { return graph.generatePrefixMap(delimiter); } public Map<String, Integer> getSuffixMap() { return graph.generateSuffixMap(delimiter); } }

main()

public static void main(String[] args) { List<String> lines = List.of( "Mary had a little lamb named Willy", "Mary had a little ham", "Old McDonald had a farm named Willy", "Willy had a little dog named ham", "( abc )", "( xyz )", "Visit Target Store", "Visit Walmart Store"); GraphManager gm = GraphManager.getInstance(lines, " "); System.out.println("Prefixes:"); for (Map.Entry<String, Integer> entry: gm.getPrefixMap().entrySet()) { System.out.println(entry.getValue() + " " + entry.getKey()); } System.out.println("\nSuffixes:"); for (Map.Entry<String, Integer> entry: gm.getSuffixMap().entrySet()) { System.out.println(entry.getValue() + " " + entry.getKey()); } }

输出

Prefixes: 2 Mary had a little 2 Visit 2 ( Suffixes: 2 ham 2 ) 2 Store 2 Willy named

Answer 4

要实现一个trie，首先你需要一个节点。

class Node<T> {

    private final T value;
    private final Node<T> parent;
    private final Map<T, Node<T>> children;
    private boolean isEnd;

    Node(T value, Node<T> parent) {
        this.value = value;
        this.parent = parent;
        this.children = new HashMap<>();
        this.isEnd = false;
    }

    Node<T> addChild(T childValue, Node<T> parent) {
        //return child node if existing, otherwise create and return
       return this.children.computeIfAbsent(childValue, value -> new Node<>(value, parent));
    }

    T getValue() {
        return this.value;
    }

    Node<T> getParent() {
        return this.parent;
    }

    boolean isEnd() {
        return this.isEnd;
    }

    void setEnd(boolean isEnd) {
        this.isEnd = isEnd;
    }

    Collection<Node<T>> children() {
        return this.children.values();
    }

    @Override
    public String toString() {
        //for easier debugging
        return "Node{" +
                "value=" + this.value +
                ", children=" + this.children.keySet() +
                ", isEnd=" + this.isEnd +
                '}';
    }
}

在节点中我们有实际值，对父节点的引用，以便于构建前缀，子节点将值映射到相应的节点，如果节点是结束节点，则有一个标志。

实际Trie实施

public class Trie<T> {

    private final Node<T> root;

    public Trie() {
        this.root = new Node<>(null, null);
    }

    public void insert(T[] elements) {
        if (elements.length == 0) {
            //don't want to set root as end node
            return;
        }
        Node<T> currentNode = this.root;
        for (T element : elements) {
            currentNode = currentNode.addChild(element, currentNode);
        }
        currentNode.setEnd(true);
    }

    public Map<Collection<T>, Integer> countPrefixes(BiConsumer<Deque<T>, T> operation) {
        Map<Collection<T>, Integer> map = new LinkedHashMap<>();
        this.countPrefixes(this.root, map, operation);
        return map;
    }

    private void countPrefixes(Node<T> current, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (current != this.root) {
            //check java doc for AbstractSet hashCode and equals
            ArrayList<T> prefix = this.buildKey(current, operation);
            int childrenCount = current.children().size();
            if (childrenCount == 0 && current.isEnd()) {
                //this sets entire collection(entire sentence 'Mary had a little lamb named Willy' for example)
                //as prefix which is met once
                childrenCount = 1;
            }
            map.merge(prefix, childrenCount, Integer::sum);
            if (childrenCount > 1) {
                //each parent node is already marked as prefix met once,
                //but any node having more than one child means entire chain of nodes
                //is a prefix met the number of children,
                //so we go backwards to update parent counts with the difference
                this.updateParentPrefixes(current.getParent(), childrenCount - 1, map, operation);
            }
        }
        for (Node<T> child : current.children()) {
            //visit each child recursively to count them
            //depth first
            this.countPrefixes(child, map, operation);
        }
    }

    //operation is abstraction for the order
    //in which we want to add elements
    //when building key/prefix
    private ArrayList<T> buildKey(Node<T> node, BiConsumer<Deque<T>, T> operation) {
        Deque<T> prefix = new LinkedList<>();
        while (node.getValue() != null) {
            operation.accept(prefix, node.getValue());
            node = node.getParent();
        }
        return new ArrayList<>(prefix);
    }

    private void updateParentPrefixes(Node<T> parent, int updateCount, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (parent == this.root) {
            //we don't want to update root, ever!!!
            return;
        }
        ArrayList<T> prefix = this.buildKey(parent, operation);
        map.merge(prefix, updateCount, Integer::sum);
        //visit parents recursively until root
        this.updateParentPrefixes(parent.getParent(), updateCount, map, operation);
    }
}

在某种意义上简化了一些，只实现了insert，但我们现在不需要update和delete。需要注意的几点：

Trie 必须有一个 root 节点，它是空的。我们用 null 值和 null 父级表示。我们必须永不更新此节点。
countPrefixes() 的参数 BiConsumer<Deque<T>, T> operation。这是我们在创建前缀时想要添加元素的顺序的抽象。它是必需的，因为在反转 collection/sentence.
Return 输入 Map<Collection<T>, Integer>，共 countPrefixes()。此实现更通用，因此每个前缀都表示为 节点值集合 .
为什么前缀是ArrayList？首先我们需要保持插入顺序。其次，它对 hashCode() 和 equals() 的实现使其成为映射键的良好候选者。引用 javadoc:

This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode. - 对于 hashCode。

In other words, two lists are defined to be equal if they contain the same elements in the same order. - 等于。

代码中的注释应解释实施中的其余细节。

要测试的主线。

public class TrieMain {

    public static void main(String[] args) {
        String spacePattern = "\s+";
        String emptyStringPattern = "";

        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String pattern = string.startsWith("(") ? emptyStringPattern : spacePattern;
            String[] words = string.split(pattern);
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

这里需要注意的事项：

在列表中添加了更多句子以进行额外测试。
我将 ( 的输入拆分为字符而不是单词作为其余部分以实现您的输出，但示例有些混乱。对于句子前缀是单词的集合，而对于这两个前缀是字符。
后缀需要第二个 trie - 否则我们可能会将前缀算作后缀，反之亦然。
Arrays.asList() 返回的列表由输入数组支持。我正在利用它来反转输入数组本身的后缀。

编辑： 虽然使用实现的字符进行测试 AbstractSet，或者准确地说是任何集合，对于前缀来说是个坏主意，因为它不允许重复的元素。更新了 trie 实现以使用 ArrayList。我还添加了 2 个主要方法，以演示 trie 使用字符串和字符：

Strings - 每个节点是一个词，通过space

public class TrieWords {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String[] words = string.split("\s+");
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

字符 - 每个节点都是字符串

public class TrieChars {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abcd)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<Character> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<Character> reversedTrie = new Trie<>();
        for (String string : strList) {
            int length = string.length();
            Character[] words = new Character[length];
            Character[] reversed = new Character[length];
            for (int i = 0; i < length; i++) {
                int reversedIndex = length - 1 - i;
                words[i] = string.charAt(i);
                reversed[reversedIndex] = string.charAt(i);
            }
            trie.insert(words);
            reversedTrie.insert(reversed);
        }
        Map<Collection<Character>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<Character>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<Character>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = e.getKey().stream().map(Object::toString).collect(Collectors.joining(""));
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

在字符串列表中查找最长公共 Prefix/Suffix 的出现次数？

Find occurrence count of the longest common Prefix/Suffix in a List of Strings?

java

string

algorithm

hashmap

图表的描述

深度优先搜索

实施