在 2 个 ArrayLists 中获取匹配和不匹配对象的最有效方法

Question

我的任务是读取 2 个文件并匹配文件的内容，并提供两个文件的不匹配条目列表。这意味着我必须显示两个文件中有多少匹配的条目以及文件 1 中有多少不在文件 2 中的不匹配条目，文件 2 中有多少不在文件 1 中的不匹配条目。

我的方法正在读取文件，从中创建 java 个对象，将 2 个文件的内容放入 2 个单独的数组列表中并进行比较。下面列出了我当前的代码。为了澄清起见，我想检查对象的内容（例如：检查 EmployeeID 并匹配两个文件）。

在下面的代码中，我将 file1 的内容与 file2 进行了匹配，并从 file2.Works 中删除了匹配的内容以匹配条目并获得 file1 与 file2 的不匹配计数。

我计划匹配 file2 中的剩余项，并使用相同的 compareByEmpIdandDOB 方法进行另一轮，使用 fileTwoEmpList 作为第一个参数，使用 fileOneEmpList 作为第二个参数，得到 file2 与 file1 不匹配的计数。但我觉得这是一种矫枉过正，而且效率不高。如果有任何问题，有人可以指出不同的方法吗？

两个数组列表都已排序。提前致谢！

public class EmpMatching {

    public void compareLists(List<EmployeeDetails> fileOneEmpList, List<EmployeeDetails> fileTwoEmpList){

        Collections.sort(fileOneEmpList);
        Collections.sort(fileTwoEmpList);

        List<EmployeeDetails> unmatchedFromListTwo = compareByEmpIdandDOB(fileOneEmpList,fileTwoEmpList);

    }

    public List<EmployeeDetails>  compareByEmpIdandDOB(List<EmployeeDetails> fileOneEmpList,List<EmployeeDetails> fileTwoEmpList){

        int matchEmpCountFromTwoFiles = 0;
        System.out.println("File One List Size Before Recon " + fileTwoEmpList.size());

        for(EmployeeDetails fileOneEmp : fileOneEmpList){

            for(int index = 0;index < fileTwoEmpList.size();index++ ){

                EmployeeDetails fileTwoEmp= fileTwoEmpList.get(index);

                if(fileOneEmp.getEmpID().equals(fileTwoEmp.getEmpID()) && fileOneEmp.getEmpDOB().equals(fileTwoEmp.getEmpDOB())){
                    matchEmpCountFromTwoFiles++;
                    fileTwoEmpList.remove(fileTwoEmp);

                    System.out.println("Match Found " + fileOneEmp.getEmpID());
                }
            }

            System.out.println("File Two List Size " + fileTwoEmpList.size());
        }

        System.out.println("Match Count >>>>>  " + matchEmpCountFromTwoFiles);
        System.out.println("File Two List Size >>>>> " + fileTwoEmpList.size());

        return fileTwoEmpList;

    }
}


//Model class

public class EmployeeDetails implements Comparable<EmployeeDetails>{


    private String EmpID;

    private String EmpName;

    private String EmpDOB;

    @Override
    public int compareTo(EmployeeDetails o) {
        return 0;
    }
}

Answer 1

您不需要为此任务对这些列表进行排序。

根据集合论，你需要找到set difference。 IE。查找仅出现在第一个或第二个列表中的所有唯一对象。

这个任务可以用线性时间复杂度的几行代码解决。但是在 EmployeeDetails.

中执行 equals/hashCode 合同很重要

public List<EmployeeDetails> compareLists(List<EmployeeDetails> fileOneEmpList,
                                          List<EmployeeDetails> fileTwoEmpList) {
    
    Set<EmployeeDetails> emp1 = new HashSet<>(fileOneEmpList);
    Set<EmployeeDetails> emp2 = new HashSet<>(fileTwoEmpList);
    
    emp1.removeAll(emp2); 
    emp2.removeAll(emp1);
    emp1.addAll(emp2);

    return new ArrayList<>(emp1);
}

上面的方法是最有效和最简单的。

如果您熟悉 Streams API，您可以尝试另一种方法并按以下方式实现此方法：

public List<EmployeeDetails> compareLists(List<EmployeeDetails> fileOneEmpList,
                                          List<EmployeeDetails> fileTwoEmpList) {
    
    return Stream.of(new HashSet<>(fileOneEmpList), new HashSet<>(fileTwoEmpList)) // wrapping with sets to ensure uniqueness (if objects in the list are guaranteed to be unique - use lists instead) 
        .flatMap(Collection::stream)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
        .entrySet().stream()
        .filter(entry -> entry.getValue() == 1) // i.e. object appear only once either in the first or in the second list
        .map(Map.Entry::getKey)
        .collect(Collectors.toList()); // .toList(); for Java 16+
}

基于流的解决方案的时间复杂度也是线性的。但正如我所说，第一个基于 Collections API 的解决方案更简单，性能也稍好一些。

如果由于某种原因，EmployeeDetails 中没有正确实现 equals() 和 hashCode()。而且您无法控制此 class，也无法更改它。然后你可以声明一个包装器 class 并执行相同的操作。

下面是如何使用 Java 16 条记录创建包装器的示例。 equals() 和 hashCode() 方法将由编译器根据 empId 和 empDob.

生成

public record EmployeeWrapper(String empId, String empDob) {
    public EmployeeWrapper(EmployeeDetails details) {
        this(details.getEmpID(), details.empDOB);
    }
}

基于 empID 和 empDOB EmployeeDetails class 的 equals/hashCode 的实现可能如下所示（此外，您可以使用 IDE 的工具来生成这些方法):

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        
        EmployeeDetails that = (EmployeeDetails) o;            
        return empID.equals(that.empID) && empDOB.equals(that.empDOB);
    }

    @Override
    public int hashCode() {
        return Objects.hash(empID, empDOB);
    }

在 2 个 ArrayLists 中获取匹配和不匹配对象的最有效方法

Most Effective Way to get matched and unmatched objects in 2 ArrayLists

java

sorting

list

arraylist

matching