如何对具有 n 个元素的数组进行排序，其中 k 个元素在 O(n + k log k) 中不合适？

Question

我今天在采访中被问到这个问题，我开始相信它是无法解决的。

给定一个大小为 n、select k 元素的排序数组，并将它们重新洗牌回数组，从而产生一个新的“nk-sorted”数组.

找到在新数组中移动的 k 个（或更少）元素。

这是创建此类数组的 (Python) 代码，但我不关心它的语言。

import numpy as np


def __generate_unsorted_array(size, is_integer=False, max_int_value=100000):
    return np.random.randint(max_int_value, size=size) if is_integer else np.random.rand(size)


def generate_nk_unsorted_array(n, k, is_integer=False, max_int_value=100000):
    assert k <= n
    unsorted_n_array = __generate_unsorted_array(n - k, is_integer, max_int_value=max_int_value)
    sorted_n_array = sorted(unsorted_n_array)
    random_k_array = __generate_unsorted_array(k, is_integer, max_int_value=max_int_value)
    insertion_inds = np.random.choice(n - k + 1, k, replace=True)  # can put two unsorted next to each other.
    nk_unsorted_array = np.insert(sorted_n_array, insertion_inds, random_k_array)
    return list(nk_unsorted_array)

这在复杂性约束下可行吗？

这只是问题的一部分。对 O(n+klogk)

中的“nk 排序数组”进行排序所需的整个问题

Answer 1

注意：这是一个概念性的解决方案。它在 Python 中编码，但由于 Python 实现 List 的方式，实际上并没有运行所需的复杂性。请参阅以查看复杂性要求中 Python 中的实际解决方案。

接受@soyuzzzz 对此的回答。

原始答案（有效，但复杂性仅在假设 Python 的列表实现链表时才正确，但事实并非如此）：

这对 O(n + klogk) 中的 nk 未排序数组进行排序，假设数组应该是升序的。

通过遍历数组找到未排序的元素。
如果找到这样的元素（它比下一个元素大），则它或下一个元素出现故障（或两者都有）。
将它们都放在一边，并将它们从数组中移除
继续遍历新得到的数组（删除后），形成找到元素之前的索引。
这将在 O(n) 时间内放置 2k 个元素。
对 2k 个元素进行排序 O(klogk)
合并两个总共有 n 个元素的排序列表，O(n)
总计 O(n + klogk)

代码：

def merge_sorted_lists(la, lb):
    if la is None or la == []:
        return lb
    if lb is None or lb == []:
        return la
    a_ind = b_ind = 0
    a_len = len(la)
    b_len = len(lb)

    merged = []
    while a_ind < a_len and b_ind < b_len:
        a_value = la[a_ind]
        b_value = lb[b_ind]

        if a_value < b_value:
            merged.append(la[a_ind])
            a_ind += 1
        else:
            merged.append(lb[b_ind])
            b_ind += 1

    # get the leftovers into merged
    while a_ind < a_len:
        merged.append(la[a_ind])
        a_ind += 1
    while b_ind < b_len:
        merged.append(lb[b_ind])
        b_ind += 1

    return merged

和

def sort_nk_unsorted_list(nk_unsorted_list):
    working_copy = nk_unsorted_list.copy()  # just for ease of testing

    requires_resorting = []

    current_list_length = len(working_copy)
    i = 0
    while i < current_list_length - 1 and 1 < current_list_length:
        if i == -1:
            i = 0

        first = working_copy[i]
        second = working_copy[i + 1]

        if second < first:
            requires_resorting.append(first)
            requires_resorting.append(second)

            del working_copy[i + 1]
            del working_copy[i]
            i -= 2
            current_list_length -= 2
        i += 1

    sorted_2k_elements = sorted(requires_resorting)
    sorted_nk_list = merge_sorted_lists(sorted_2k_elements, working_copy)
    return sorted_nk_list

Answer 2

即使是正确的，它实际上并没有给我们 O(n + k * log k)。问题出在 sort_nk_unsorted_list 函数中。不幸的是，从 Python 列表中删除任意项目不是常数时间。 It's actually O(n)。这使得整个算法的复杂度为 O(n + nk + k * log k)

我们可以做的是使用不同的数据结构来解决这个问题。如果您使用双向链表，从该列表中删除一个项目实际上是 O(1)。不幸的是，Python 默认情况下没有。

这是我实现 O(n + k * log k).

的解决方案

解决问题的入口函数：

def sort(my_list):
    in_order, out_of_order = separate_in_order_from_out_of_order(my_list)

    out_of_order.sort()

    return merge(in_order, out_of_order)

分离有序元素和乱序元素的函数：

def separate_in_order_from_out_of_order(my_list):
    list_dll = DoublyLinkedList.from_list(my_list)
    out_of_order = []
    
    current = list_dll.head

    while current.next is not None:
        if current.value > current.next.value:
            out_of_order.append(current.value)
            out_of_order.append(current.next.value)

            previous = current.prev

            current.next.remove()
            current.remove()

            current = previous
        else:
            current = current.next

    in_order = list_dll.to_list()

    return in_order, out_of_order

合并两个分离列表的函数：

def merge(first, second):
    """
    Merges two [sorted] lists into a sorted list.

    Runtime complexity: O(n)
    Space complexity: O(n)
    """
    i, j = 0, 0

    result = []
    
    while i < len(first) and j < len(second):
        if first[i] < second[j]:
            result.append(first[i])
            i += 1            
        else:
            result.append(second[j])
            j += 1            

    result.extend(first[i:len(first)])
    result.extend(second[j:len(second)])

    return result

最后，这是 DoublyLinkedList 实现（我使用哨兵节点使事情变得更容易）：

class DoublyLinkedNode:
    def __init__(self, value):
        self.value = value
        self.next = None
        self.prev = None

    def remove(self):
        if self.prev:
            self.prev.next = self.next

        if self.next:
            self.next.prev = self.prev
        

class DoublyLinkedList:
    def __init__(self, head):
        self.head = head

    @staticmethod
    def from_list(lst):
        sentinel = DoublyLinkedNode(-math.inf)
        previous = sentinel

        for item in lst:
            node = DoublyLinkedNode(item)
            node.prev = previous
            previous.next = node

            previous = node
        
        return DoublyLinkedList(sentinel)

    def to_list(self):
        result = []
        current = self.head.next

        while current is not None:
            result.append(current.value)
            current = current.next

        return result

这些是我用来验证代码的单元测试：

import unittest


class TestSort(unittest.TestCase):
    def test_sort(self):
        test_cases = [
            # ( input, expected result)
            ([1, 2, 3, 4, 10, 5, 6], [1, 2, 3, 4, 5, 6, 10]),
            
            ([1, 2, 5, 4, 10, 6, 0], [0, 1, 2, 4, 5, 6, 10]),

            ([1], [1]),

            ([1, 3, 2], [1, 2, 3]),
            ([], [])
        ]

        for (test_input, expected) in test_cases:
            result = sort(test_input)
        
            self.assertEqual(expected, result)

如何对具有 n 个元素的数组进行排序，其中 k 个元素在 O(n + k log k) 中不合适？

How to sort an array with n elements in which k elements are out of place in O(n + k log k)?

python

arrays

algorithm

sorting

shuffle