为什么我的蛮力(O((n1 + n2)log(n1 + n2)))解决方案比优化解决方案(O(n1 + n2))更快?

why is my brute force(O((n1+n2)log(n1+n2))) solution faster than the optimised solution(O(n1+n2))?

我正在努力将 2 个长度为 n1 和 n2 的排序数组合并到 return 一个长度为 n1+n2 的排序数组。

我的强力解决方案应该具有 O((n1+n2)log(n1+n2)) 的时间复杂度,而我的优化解决方案应该具有 O(n1+n2) 的时间复杂度。有人能解释一下我的蛮力是如何比优化解决方案快近 10 倍的吗?请在下面找到代码:

from time import time
import random

array1 = sorted([random.randint(1, 500) for i in range(100000)])
array2 = sorted([random.randint(1, 1000000) for i in range(100000)])

def brute_force(arr1, arr2):
    '''
    Brute force solution O((n1+n2)log(n1+n2)),  Space complexity O(1)

    '''
    #sorted function has the time complexity on nlog(n)
    return sorted(arr1+arr2)


def optimized_soln(arr1, arr2):
    '''
    More optimized soltuion Time Complexity O(n1+n2),  Space complexity O(n1+n2) 

    '''
    i = 0
    j = 0
    merged_array = []
    s1 = time()
    #if one of the arrays is empty , no need to merge
    if len(arr1) == 0:
        return arr2

    if len(arr2) ==0:
        return arr1

    while i<len(arr1):
        if arr1[i] <= arr2[j]:
            if i < (len(arr1) - 1):
                merged_array.append(arr1[i])
                i = i + 1
            else:
                merged_array.append(arr1[i])
                merged_array = merged_array + arr2[j:]
                i = len(arr1)
        else:
            if j < (len(arr2) - 1):
                merged_array.append(arr2[j])
                j = j + 1
            else:
                merged_array.append(arr2[j])
                merged_array = merged_array + arr1[i:]
                i = len(arr1)

    return merged_array


s1 = time()
print('Brute Force: O((n1+n2)log(n1+n2)),  Space complexity O(1)')
brute_force(array1,array2)
print('Time Taken: ',(time()-s1))
s2 = time()
print('Optimized Soln: Time Complexity O(n1+n2),  Space complexity O(n1+n2)')
optimized_soln(array1,array2)
print('Time Taken: ',(time()-s2))

Python 中的每个操作都有一些与之相关的开销。您的蛮力解决方案具有 O(1) 此类操作,而排序本身是在较低级别的代码中实现的。

优化的解决方案具有 O(n) Python 操作,因此在这种情况下开销要大得多。因此,在后一种情况下具有更好的算法复杂性的好处被开销所抑制。

您的强力解决方案在 Python 内以线性时间运行。 sort 的时间复杂度通常为 O(n log n),但这并不排除它在某些情况下更快。 Python 使用 timsort,它旨在利用输入数据中已排序的运行。

来自https://en.wikipedia.org/wiki/Timsort

Timsort was designed to take advantage of runs of consecutive ordered elements that already exist in most real-world data, natural runs. It iterates over the data collecting elements into runs and simultaneously putting those runs in a stack. Whenever the runs on the top of the stack match a merge criterion, they are merged.

因此,当您追加两个排序列表,然后使用 timsort 对它们进行排序时,它会找到两个范围,合并它们,然后停止。考虑到复杂度都是线性的,加速是简单的事实 sort 是用优化的 C 编写的,它很容易比直接用 python.

编写的代码快 10 倍或更多