如何比较 2 个列表和 return 最大子集的列表?

How to compare 2 lists and return a list of the greatest subset?

我想比较两个 ArrayList 和 return Java 中最大的相似子集。所以我想比较列表的各个部分,而不仅仅是单个值。

示例:

list 1       list 2
F            A
A            B
B            C
C            F
D            D
Z            Z
A
F
C

最大子集:

Arraylist: [A,B,C]

第二大子集应该是:

ArrayList: [D,Z]

我怎样才能有效地做到这一点?(不使用超过 2 个 for 循环)

retainAll() 不起作用,retainAll() return 是相等的值,而不是最大的子集。

编辑 我想要作为输出,最大子集之前的列表,最大子集,最大子集之后的列表。通过这个例子,输出应该是:

[[F],[null]],[A,B,C],[[D,Z,A,F,C],[F,D,Z]]

公共列表的最大大小将是较小列表的大小。 您随后可以检查大小小于或等于此最大值的子列表是否相等。 检查以下代码以供参考:

public static <T> List<List<T>> getLargestCommonListAndRest(List<T> list1, List<T> list2) {
    int beginSize = list1.size() < list2.size() ? list1.size() : list2.size();
    while (beginSize > 0) {
        for (int i = 0; i <= list1.size() - beginSize; i++) {
            List<T> subList1 = list1.subList(i, i + beginSize - 1);
            for (int i1 = 0; i1 <= list2.size() - beginSize; i1++) {
                List<T> subList2 = list2.subList(i1, i1 + beginSize - 1);
                if (subList1.equals(subList2))
                    return Arrays.asList(list1.subList(0, Integer.max(0, i)), subList1,
                            list1.subList(i + beginSize - 1, list1.size()));
            }
        }
        beginSize--;
    }
    return new ArrayList();
}

假设您的列表是字符串类型 使用

list#retainAll()

获取这些列表之间的巧合

示例:

List<String> listA...
List<String> listB...
List<String> listC = new ArrayList<String>();    // new list to keep the originals unmodified.
listC.addAll(listA);   // add all the list a to c
listC.retainAll(listB); // keep the coincidences

假设两个 List 都有 String 个元素,你可以使用这个:

    List<List<String>> beforeList = new ArrayList<>();
    List<List<String>> afterList = new ArrayList<>();
    List<String> commonSubsetList = new ArrayList<>();
    for (int i = 0; i < list1.size(); i++) {
        int k = i;
        List<String> tmpList = new ArrayList<>();
        List<String> tmpBeforeList1 = list1.subList(0, i); // container for before elements from list1
        List<String> tmpAfterList1 = new ArrayList<>(); // container for after elements from list1
        List<String> tmpBeforeList2 = new ArrayList<>(); // container for before elements from list2
        List<String> tmpAfterList2 = new ArrayList<>(); // container for after elements from list2

        for (int j = 0; j < list2.size();) {
            if(k < list1.size() && list1.get(k).equals(list2.get(j))) {
                // when common element is found, increment both counters and add element to tmp list
                tmpList.add(list2.get(j));
                k++;
                j++;
            } else {

                if(tmpList.size() > 0) {
                    tmpAfterList1 = list1.subList(k, list1.size());
                    tmpAfterList2 = list2.subList(j, list2.size());
                    break;
                } else {
                    tmpBeforeList2.add(list2.get(j));
                }

                j++;
            }
        }

        if(commonSubsetList.size() <= tmpList.size()) {
            // reset beforeList and afterList before adding new list
            beforeList.clear();
            afterList.clear();

            // add new lists
            beforeList.add(tmpBeforeList1);
            beforeList.add(tmpBeforeList2);
            afterList.add(tmpAfterList1);
            afterList.add(tmpAfterList2);
            commonSubsetList = new ArrayList<>(tmpList);
        }
    }

    System.out.println(beforeList + ", " + commonSubsetList + ", " + afterList);

这也包括前后列表。希望这就是你想要的。

很简单。你只需要两个循环就可以找出两个列表之间的最大公共子集。

步骤

  • 遍历第一个列表
  • 在第一个循环内循环第二个列表
  • 将第二个列表的每个值与第一个列表的增量索引 k 进行比较
  • 当有匹配时增加索引k
  • 否则将索引 k 重置为第一个列表
  • 的起始索引 i

下面示例程序的复杂度为 O(n^2)。您可以进一步降低复杂性。

示例代码:

List<Character> list1 = Arrays.asList(new Character[] { 'F', 'A', 'B', 'C', 'D', 'Z', 'A', 'F', 'C' });
List<Character> list2 = Arrays.asList(new Character[] { 'A', 'B', 'C', 'F', 'D', 'Z' });
List<List<Character>> sublists = new ArrayList<>();

for (int i = 0; i < list1.size(); i++)
{
    int k = i;
    for (int j = 0; j < list2.size() && k < list1.size(); j++)
    {
        if (list1.get(k) == list2.get(j))
        {
            k++;
        }
        else if (k > i)
        {
            sublists.add(list1.subList(i, k));
            k = i;
        }
    }

    if (k > i)
    {
        sublists.add(list1.subList(i, k));
    }
}

System.out.println(sublists);

看到这个:

public static void main(String[] args) {
    ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(new String[]{"F", "A", "B", "C", "D", "Z", "A", "F", "C"}));
    ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(new String[]{"A", "B", "C", "F", "D", "Z"}));

    ArrayList<String> result = null;
    if (Arrays.equals(list1.toArray(), list2.toArray())) {
        result = list1;
    } else {
        for (int i = 0; i < list1.size(); i++) {
            String word = list1.get(i);
            //int index = list2.indexOf(word); // if list2 has repeat words, this can not give a exact result.
            for (int index : indicesOf(list2, word)) { // support repeat words in list2, but need a small loop.
                if (index >= 0) {
                    int ori = i;
                    ArrayList<String> temp = new ArrayList<String>();
                    temp.add(word);
                    //while (true) {
                    //    int pos1 = (i + 1) % list1.size();
                    //    int pos2 = (index + 1) % list2.size();
                    //    if (list1.get(pos1).equals(list2.get(pos2))) {
                    while (index < list2.size() - 1) {
                        if (i + 1 < list1.size() && list1.get(i + 1).equals(list2.get(index + 1))) {
                            temp.add(list1.get(i + 1));
                            i++;
                            index++;
                        } else {
                            break;
                        }
                    }
                    System.out.println(String.format("Found a subset: %s", temp));
                    if (null == result || temp.size() > result.size()) {
                        result = temp;
                    }
                }
            }
        }
    }
    if (null != result) {
        System.out.println("The greatest subset is: " + result);
    } else {
        System.out.println("No subset found.");
    }
}

static Integer[] indicesOf(ArrayList<String> list, String obj) {
    List<Integer> indices = new ArrayList<Integer>();
    for (int i = 0; i < list.size(); i++) {
        if (obj.equals(list.get(i))) {
            indices.add(i);
        }
    }
    return indices.toArray(new Integer[]{});
}

输出为:

Found a subset: [F]
Found a subset: [A, B, C]
Found a subset: [D, Z]
Found a subset: [A]
Found a subset: [F]
Found a subset: [C]
The greatest subset is: [A, B, C]

----------------编辑---------------------

你说不想要 [D,Z,A],因为我将列表视为尾头循环。没有这个会更容易,我已经更改了代码。

并且,考虑到您的列表允许重复单词,我修正了我的代码。

这是一个很好的解决方案,复杂度为 O(n)(如果我错了请纠正我)利用 HashMap(为了可读性和简单性,我使用 String,相同的逻辑可以应用于List):

public static String greatestSubset(String list1, String list2) {
    int shift = -1, maxCount = -1, index1 = -1, index2 = -1;
    HashMap<Integer, Integer> shiftMap = new HashMap<Integer, Integer>();
    HashMap<Integer, Boolean> aliveShiftMap = new HashMap<Integer, Boolean>();

    for(int i = 0 ; i < list1.length() ; i++) {
        char c = list1.charAt(i);
        int index;

        //calculate shifts, if exists increments, otherwise add with count=1 
        for( shift = i-(index=list2.indexOf(c)) ; index != -1 ; shift = i-(index=list2.indexOf(c, index+1)) ) {
            if(shiftMap.containsKey(shift)) {
                shiftMap.replace(shift, shiftMap.get(shift)+1);
                aliveShiftMap.replace(shift, true);
            } else {
                shiftMap.put(shift, 1);
                aliveShiftMap.put(shift, true);
            }
        }

        for (Entry<Integer, Boolean> entry : aliveShiftMap.entrySet()) {
            if(!entry.getValue()) { //if shift not incremented, terminate
                if(shiftMap.get(entry.getKey()) > maxCount) {
                    maxCount = shiftMap.get(entry.getKey());
                    index1 = i-maxCount;
                    index2 = i;
                }

                shiftMap.remove(entry.getKey());
                aliveShiftMap.put(entry.getKey(), true);
            } else { // else keep for next iteration
                aliveShiftMap.put(entry.getKey(), false);
            }
        }

        //remove all non-incrementedn shifts
        aliveShiftMap.values().removeAll(Collections.singleton(true));
    }

    return list1.substring(index1, index2);
}

请注意,HashMap 复杂化只需要考虑同一列表中的重复对象,否则您只需要几个原始 int 变量。

算法总结如下:

  1. 递增 list1 的字符,并计算匹配 list2 上的相同 char 所需的 shift
  2. 如果 shift 已经存在于 shiftMap 中,递增,否则添加 count 1
  3. 如果给定的shift在当前迭代中没有递增,则终止它,并将其保留为maxCount(记录index1index2)如果超过当前最大值

您必须考虑列表中所有可能的项目对。当一对匹配时,然后尝试从这些索引开始构建一个子集。如果该子集大于它,则该子集将替换当前候选者。

一个优化是当子集大于较小列表长度的一半时退出。

您可以修改以下示例以收集所有子集及其索引信息。

示例:

http://ideone.com/DehDwk

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {
    /**
     * Holds information about a sub set
     *
     * @param <T> type of subset items
     */
    private static class SubSet<T> {
        List<T> items; // items of subset
        int startIndex1; // start index in list 1
        int endIndex1; // end index in list 1
        int startIndex2; // start index in list 2
        int endIndex2; // end index in list 2
    }

    /**
     * Run main example.
     *
     * @param args arguments - none honored.
     * @throws java.lang.Exception - in case of any error.
     */
    public static void main(String[] args) throws java.lang.Exception {
        // define 2 lists
        List<Integer> list1 = Arrays.asList(1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8);
        List<Integer> list2 = Arrays.asList(2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5);

        // print the lists
        System.out.println("First list: " + Arrays.toString(list1.toArray()));
        System.out.println("Second list: " + Arrays.toString(list2.toArray()));

        // get largest sub set
        SubSet<Integer> largest = getLargestSubSet(list1, list2);


        if (largest == null) {
            // nothing found
            System.out.println("No subset found.");
        } else {
            // print info about subset

            System.out.println("Largest subset: " + Arrays.toString(largest.items.toArray()));

            if (largest.startIndex1 > 0) {
                List<Integer> beforeList1 = list1.subList(0, largest.startIndex1);
                System.out.println("Items before largest subset in first list: "
                        + Arrays.toString(beforeList1.toArray()));
            }

            if (largest.endIndex1 < list1.size() - 1) {
                List<Integer> afterList1 = list1.subList(largest.endIndex1 + 1, list1.size());
                System.out.println("Items after largest subset in first list: "
                        + Arrays.toString(afterList1.toArray()));
            }

            if (largest.startIndex2 > 0) {
                List<Integer> beforeList2 = list2.subList(0, largest.startIndex2);
                System.out.println("Items before largest subset in second list: "
                        + Arrays.toString(beforeList2.toArray()));
            }

            if (largest.endIndex2 < list2.size() - 1) {
                List<Integer> afterList2 = list2.subList(largest.endIndex2 + 1, list2.size());
                System.out.println("Items after largest subset in second list: "
                        + Arrays.toString(afterList2.toArray()));
            }

        }


    }

    /**
     * Equality check for items.
     *
     * @param obj1 first item.
     * @param obj2 second item.
     * @param <T>  item type.
     * @return true if equal,false if not.
     */
    private static <T> boolean areEqual(T obj1, T obj2) {
        return obj1 == obj2; // naive comparison
    }

    /**
     * Get largest subset (first occurrence) for given lists.
     *
     * @param list1 first list.
     * @param list2 second list.
     * @param <T>   list item type.
     * @return Largest sub sequence list, or empty list.
     */
    private static <T> SubSet<T> getLargestSubSet(List<T> list1, List<T> list2) {
        SubSet<T> output = null;

        for (int i = 0; i < list1.size(); i++) {
            for (int j = 0; j < list2.size(); j++) {

                // optimisation : exit early
                if (output != null && output.items.size() > Math.min(list1.size(), list2.size())) {
                    return output;
                }

                if (areEqual(list1.get(i), list2.get(j))) {
                    // inspect sub sequence from this (i,j) onwards
                    output = inspectSubSet(list1, list2, i, j, output);
                }
            }
        }

        return output;
    }

    /**
     * For given starting indices, inspect if there is a larger subset, than given one.
     *
     * @param list1     first list.
     * @param list2     second list.
     * @param index1    first index.
     * @param index2    second index.
     * @param oldSubSet existing largest subset, for comparison.
     * @param <T>       list item type.
     * @return larger subset, if found, else existing one is returned as is.
     */
    private static <T> SubSet<T> inspectSubSet(List<T> list1, List<T> list2,
                                               int index1, int index2, SubSet<T> oldSubSet) {
        // new subset candidate
        SubSet<T> newSubSet = new SubSet<T>();
        newSubSet.items = new ArrayList<T>();
        newSubSet.startIndex1 = index1;
        newSubSet.endIndex1 = index1;
        newSubSet.startIndex2 = index2;
        newSubSet.endIndex2 = index2;

        // keep building subset as subsequent items keep matching
        do {
            newSubSet.items.add(list1.get(index1));
            newSubSet.endIndex1 = index1;
            newSubSet.endIndex2 = index2;
            index1++;
            index2++;
        } while (index1 < list1.size() && index2 < list2.size()
                && areEqual(list1.get(index1), list2.get(index2)));

        // return first, larger or same.
        if (oldSubSet == null) {
            return newSubSet;
        } else if (newSubSet.items.size() > oldSubSet.items.size()) {
            return newSubSet;
        } else {
            return oldSubSet;
        }
    }

}

输出:

First list: [1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8]
Second list: [2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5]
Largest subset: [2, 3, 4, 5]
Items before largest subset in first list: [1]
Items after largest subset in first list: [6, 3, 2, 5, 6, 7, 3, 8]
Items before largest subset in second list: [2, 8, 7]
Items after largest subset in second list: [3, 2, 5, 1, 5]