如何合并 Java 中包含相同值的多个数组

Question

我正在努力寻找合并数组（或创建新数组）的最佳方法，方法是查看它们的共享值。

List<String[]> dictionary = new ArrayList<String[]>();

这是我的“字典”，其中包含 2 个单词的数组，例如它包含数组：

["A","B"]
["B","C"]
["D","E"]
["F","C"]
["G","H"]
["T","D"]

我需要按它们共享的值合并它们，因此例如完成的“字典”（或全新的列表）将如下所示：

["A","B","C","F"];
["D","E","T"];
["G","H"];

此外，不必删除旧数组，它们可以保留在“字典”中，但我需要合并的数组，但我很难弄清楚。

数组无论如何都不需要排序。

这是我目前所拥有的，但没有用

    public static void SynonymsMerge(List<String[]> dictionary){
    ArrayList<ArrayList<String>> newDictionary = new ArrayList<ArrayList<String>>();
    for(int i=0;i < dictionary.size(); i++){
        ArrayList<String> synonyms = new ArrayList<String>();
        for(int j=0; j < dictionary.get(i).length; j++){
            synonyms.add(dictionary.get(i)[j]);
        }
        newDictionary.add(synonyms);
    }
    for(int i=0;i< newDictionary.size();i++){
        for(int j=0; j < newDictionary.size();j++){
            for (int k=0; k < newDictionary.get(j).size() ;k++) {
                    if (newDictionary.get(i).equals(newDictionary.get(j)))
                        continue;
                    if (newDictionary.get(i).contains(newDictionary.get(j).get(k)))
                        newDictionary.get(i).addAll(newDictionary.get(j));

Answer 1

对于这个问题你需要连接 A 和 B（ 反之亦然），然后 C 和 B 和 C 与 F 等等。很多路口，很多圈。

这是利用 Graph 作为数据结构的完美案例。

准确地说，这个数据集可以表示为一个循环无向不相交图，其中包含多个connected components。并且根据图论，这个任务归结为在[=37]中发现所有组件 =]图表.

为此，我们需要采取以下步骤：

通过解析输入数据创建并初始化图。
迭代图的顶点集合，并遍历每个遇到的连接组件.每个以前未见过的 vertex 表示它所属的 component 也尚未被发现。作为遍历算法，我选择了 Depth first search（但出于此任务的目的，广度优先搜索 算法也可以正常工作）。

实施：

public class Graph {
    private Map<String, Vertex> vertexByName = new HashMap<>();
    
    private Graph() {} // no way and no need to invoke this constractor outside the class
    
    public static Graph getInstance(List<List<String>> dictionary) { // method responsible for instantiation and initialization of the graph
        Graph graph = new Graph();
        graph.init(dictionary);
        return graph;
    }
    
    private void init(List<List<String>> dictionary) {
        for (List<String> list: dictionary) {
            for (String name: list) {
                addVertex(name, list);
            }
        }
    }
    
    private void addVertex(String name, List<String> neighbours) {
        
        Vertex cur = vertexByName.computeIfAbsent(name, Vertex::new);
    
        for (String neighbourName: neighbours) {
            if (neighbourName.equals(name)) continue;
        
            Vertex neighbour = vertexByName.computeIfAbsent(neighbourName, Vertex::new);
            cur.addNeighbour(neighbour);
            neighbour.addNeighbour(cur); // this graph is undirectional, i.e. both vertices in each connected pair should hold a reference to one another
        }
    }
    
    public List<List<String>> getComponents() {
        List<List<String>> components = new ArrayList<>();
        
        Set<Vertex> seen = new HashSet<>();
        for (Vertex vertex: vertexByName.values()) {
            if (seen.contains(vertex)) continue;
            
            components.add(getComponentNames(vertex, seen));
        }
        return components;
    }

    // Depth first search implementation
    private List<String> getComponentNames(Vertex vertex, Set<Vertex> seen) {
        Deque<Vertex> stack = new ArrayDeque<>();
        List<String> names = new ArrayList<>();
        stack.push(vertex);
        seen.add(vertex);
        
        while(!stack.isEmpty()) {
            Vertex current = stack.pop();
            names.add(current.getName());
            
            for (Vertex neighbour: current.getNeighbours()) {
                if (seen.contains(neighbour)) continue;
                
                seen.add(neighbour);
                stack.push(neighbour);
            }
        }
        return names;
    }
    
    private class Vertex {
        private String name;
        private Set<Vertex> neighbours = new HashSet<>();
        
        public Vertex(String name) {
            this.name = name;
        }
    
        public Vertex(String name) {
            this.name = name;
        }
        
        public boolean addNeighbour(Vertex neighbour) {
            return neighbours.add(neighbour);
        }
    
        public String getName() {
            return name;
        }
    
        public Set<Vertex> getNeighbours() {
            return neighbours;
        }
    }
}

main() - 演示

public static void main(String[] args) {
    List<List<String>> dictionary =
        List.of(List.of("A","B"), List.of("B","C"),
                List.of("D","E"), List.of("F","C"),
                List.of("G","H"), List.of("T","D"));
    
    Graph graph = Graph.getInstance(dictionary);
    List<List<String>> componentNames = graph.getComponents();

    System.out.println(componentNames);
}

输出

[[A, B, C, F], [T, D, E], [G, H]]

Answer 2

首先，这是代码。我将输入类型从 List<String[]> 更改为 List<List<String>>，因为混合列表和数组没有任何意义。这也适用于输出类型。

代码

public static List<List<String>> merge(List<List<String>> dictionary) {
        List<List<String>> newDictionary = new ArrayList<>();

        for (List<String> stringPair : dictionary) {

            List<Integer> matchIndices = new ArrayList<>();
            for (int i = 0; i < newDictionary.size(); i++) {
                List<String> newStrings = newDictionary.get(i);

                for (String str : stringPair) {
                    if (newStrings.contains(str)) {
                        matchIndices.add(i);
                    }
                }
            }
            if (matchIndices.size() == 0) {
                newDictionary.addAll(new ArrayList<List<String>>(Collections.singleton(new ArrayList<>(stringPair))));
                continue;
            }

            matchIndices.sort(Integer::compareTo);

            if (matchIndices.size() == 1) {
                newDictionary.get(matchIndices.get(0)).addAll(new ArrayList<>(stringPair));
            } else {
                int last = matchIndices.remove(0);
                while (matchIndices.size() > 0) {
                    int i = matchIndices.get(0);
                    newDictionary.get(last).addAll(newDictionary.get(i));
                    newDictionary.remove(i);
                    matchIndices.remove(0);
                    matchIndices = new ArrayList<>(matchIndices.stream().map(a -> a - 1).toList());
                }
            }
        }
        newDictionary = newDictionary.stream()
                .map(strings -> strings.stream().distinct().toList())
                .toList();

        return newDictionary;
    }

它是如何工作的？

dictionary 类型 List<List<String>> 的输入（内部列表的最大大小为 2，即使理论上该函数可以处理更多字符串）
newDictionary类型函数的输出List<List<String>>

对directory

中的每个输入pair/List个字符串执行以下代码

获取 newDictionary 中所有现有的不同“组”（它们的索引），其中已经存在来自 par 的字符串。此索引列表称为 matchIndices
示例：stringPair=["A","E"] newDictionary:[["I", "A", "O"], ["P", "D"]] 将产生在 matchIndices=[0] 中，因为在 newDictionary
如果 matchIndices.size() 为 0，则在 newDictionary 中使用字符串对创建一个新组。回到 1.
如果 matchIndices.size() 为 1，则将字符串对附加到具有 matchIndices 中指定索引的特定 newDictionary 组。回到 1.
如果 matchIndices.size() 大于 1，这意味着 newDictionary 中具有 matchIndices 中指定索引的多个组必须在 for 中合并在一起-环形。回到 1.

最后我们必须确保 newDictionary 中的列表中没有重复项。

主要方法

    public static void main(String[] args) {
        List<List<String>> dictionary = new ArrayList<>(List.of(
                List.of("A", "B"),
                List.of("B", "C"),
                List.of("D", "E"),
                List.of("F", "C"),
                List.of("G", "H"),
                List.of("T", "D")));
        
        System.out.println(merge(dictionary));
    }

为什么我们需要第 4 步？

在您的具体示例中，我们不必合并多个组。
但是像这样的输入数据

List<List<String>> dictionary = new ArrayList<>(List.of(
                List.of("A", "B"),
                List.of("B", "C"),
                List.of("D", "E"),
                List.of("F", "E"),
                List.of("E", "A")));

我们最终到达 newDictionary=[[A, B, B, C], [D, E, F, E]] 的地步，我们必须尝试插入 [E, A]。这里来自 newDictionary 的两个组都必须合并在一起。
然后这会导致 [[A, B, C, D, E, F]] 的输出，其中两个组被合并并删除重复项。

P.s.

我对这个解决方案不是很满意，因为它并不清楚到底发生了什么，但我仍然发布这个，因为你说过你会对任何解决方案感到满意。 :)

Answer 3

Un-refined首先想到的——索引数组，然后合并。重复。

迭代 ArrayList 中的数组；
索引数组中的项目；
合并重叠的项目；
对合并结果重复相同的过程，直到没有重叠。

使用你的例子：

[A, B] (Call this #1 array)
[B, C] (Call this #2 array)
[D, E] (Call this #3 array)
[F, C] (Call this #4 array)
[G, H] (Call this #5 array)
[T, D] (Call this #6 array)

现在，像这样准备索引：

A -> 1   (because A occurs only in array 1)
B -> 1,2 (because B occurs in array 1 and 2)
C -> 2,4 ...
D -> 3,6 ...
E -> 3   ...
F -> 4   ...
G -> 5   ...
T -> 6   ...

查看上面的索引，我们知道我们应该合并 1 和 2、2 和 4，以及 3 和 6。这将给我们：

[A, B, C] (This is our new #1)
[B, C, F] (This is our new #2)
[D, E, T] (This is our new #3)
[G, H]    (This is our new #4)

对新的数组 ArrayList 重复这些步骤。 Re-indexing 给出...

A -> 1
B -> 1,2
C -> 1,2
D -> 3
E -> 3
F -> 2
G -> 4
H -> 4
T -> 3

再次合并重叠部分。这次只有 1 和 2 重叠。合并它会给你：

[A, B, C, F] (This is our new #1)
[D, E, T]    (This is our new #2)
[G, H]       (This is our new #3)

再一次，re-index，

A -> 1
B -> 1
C -> 1
D -> 2
E -> 3
F -> 1
G -> 3
H -> 3
T -> 2

由于这次没有重叠数组，所以没有更多要合并的东西，这就是最终答案。

Answer 4

使用 Union find 添加解决方案，因此这里的目标是遍历所有字符串，同时找到共同的“领导者”。

之后我们将再次遍历字典，但这次每个字符串都有一个领导者，我们将它们绑定到一个共同的领导者，然后创建合并的字典

public class UnionFind
{
    private Map<String, String> graph;

    public UnionFind()
    {
        graph = new HashMap<>();
    }

    public String find(String str)
    {
        if (str == null) throw new IllegalArgumentException("Invalid String");

        if (graph.getOrDefault(str, str).equals(str))
            graph.put(str, str);
        else 
            graph.put(str, find(graph.get(str)));

        return graph.get(str);
    }

    public void union(String str1, String str2)
    {
        String root1 = find(str1);
        String root2 = find(str2);


        if (!root1.equals(root2))
        {
            if (root1.equals(str1)) graph.put(graph.get(root1), graph.get(root2));
            else graph.put(graph.get(root2), graph.get(root1));
        }
    }

    public static void main(String[] args)
    {
        List<List<String>> dictionary = prepareDictionary();

        UnionFind unionFind = new UnionFind();

        for (List<String> list : dictionary)
        {
            for (int i = 1; i < list.size(); i++)
            {
                unionFind.union(list.get(i - 1), list.get(i));
            }
        }

        Map<String, Set<String>> map = new HashMap<>();

        for (List<String> list : dictionary)
        {
            for (String str : list)
            {
                String parent = unionFind.find(str);
                if (!map.containsKey(parent))
                    map.put(parent, new LinkedHashSet<>());

                map.get(parent).add(str);
            }
        }

        List<List<String>> result = new ArrayList<>();
        for(Map.Entry<String, Set<String>> entry : map.entrySet())
        {
            result.add(new ArrayList<>(entry.getValue()));
        }

        System.out.println(result);
    }

    private static List<List<String>> prepareDictionary()
    {
        List<List<String>> dictionary = new ArrayList<>();

        dictionary.add(Arrays.asList("A", "B"));
        dictionary.add(Arrays.asList("B", "C"));
        dictionary.add(Arrays.asList("D", "E"));
        dictionary.add(Arrays.asList("F", "C"));
        dictionary.add(Arrays.asList("G", "H"));
        dictionary.add(Arrays.asList("T", "D"));

        return dictionary;
    }

结果：

[[A, B, C, F], [D, E, T], [G, H]]

Answer 5

这是另一个解决方案，其中包含一组适合您的结构的字符串

public void merge(List<String[]> dictionary) {
    List<Set<String>> dictionaryList = dictionary.stream()
            .map(x -> new HashSet<> (Arrays.asList(x))).collect(Collectors.toList());

    for (int i = 0; i < dictionaryList.size() ; i++){
        Set list = dictionaryList.get(i);

        for (int j = i + 1; j < dictionaryList.size() ; j++){
            Set otherList = dictionaryList.get(j);
            Set result = (Set) list.stream().filter(otherList::contains).collect(Collectors.toSet());

            if (!result.isEmpty()) {
                list.addAll(otherList);
                dictionaryList.remove(j);
            }
        }
    }
    System.out.println(dictionaryList);
}

结果

[[A, B, C, F], [D, T, E], [G, H]]

Answer 6

所有答案都很好谢谢！但是我不能使用它们，因为也许我需要解释代码并且我没有完全理解它们（第一次编码 java yaaay！）。我最后做的是：我将所有输入保存到哈希集列表中，因为哈希集不能重复值，然后我运行通过 2 个 for（嵌套）循环使用 if statment

所有输入

if(return !Collections.disjoint(hashset1,hashset2);)

之后我使用 set.addAll(hashset1); set.addAll(hashset2);

合并了它们

然而它仍然不完整，有一些应该合并但没有合并的集合。所以我运行它使用相同的 if 语句再次通过 2for（嵌套）循环并且它起作用了（我希望）。它适用于 2000 字的输入，我希望它适用于更多字的输入 :D 谢谢大家的帮助。

如何合并 Java 中包含相同值的多个数组

How to merge multiple Arrays containing the same value in Java

java

arrays

algorithm

list

代码

它是如何工作的？

主要方法

为什么我们需要第 4 步？

P.s.