删除近排序数组中的 unsorted/outlier 个元素

Remove unsorted/outlier elements in nearly-sorted array

给定一个像 [15, 14, 12, 3, 10, 4, 2, 1] 这样的数组。我怎样才能确定哪些元素乱序并删除它们(在本例中为数字 3)。我不想对列表进行排序,而是检测异常值并将其删除。

另一个例子:

[13, 12, 4, 9, 8, 6, 7, 3, 2]

我希望能够删除#4 和#7,以便我最终得到:

[13, 12, 9, 8, 6, 3, 2]

遇到这种情况也会出现问题:

[15, 13, 12, 7, 10, 5, 4, 3]

您可以删除 7 或 10 以使此数组排序。

总的来说,我要解决的问题是给定一个数值读数列表(有些可能会偏离很多)。我希望数组只包含遵循一般趋势线的值并删除任何异常值。我只是想知道是否有一种简单的方法可以做到这一点。

higuaro 描述的一个简单算法可以帮助您生成正确的序列:

对于索引 i 处的每个元素,如果 a[i] < a[i + 1],我们可以简单地删除该元素 a[i].

for(int i = 0; i < size; i++)
    while(a[i] < a[i + 1]){
       remove a[i];
       i--;
    }

但是,这种方式不能保证被移除的元素个数最少。例如,对于这个序列 [10, 9, 8, 100, 1, 0],移除 100 将是最优的,而不是移除 8,然后移除 9,然后移除 10。

要找到最小的删除元素数,我们注意到我们需要找到最长的递减子序列,这与经典的longest increasing sub sequence whose solution has been described here

类似

我会把你的问题简化为最长递增(递减)子序列问题。

https://en.wikipedia.org/wiki/Longest_increasing_subsequence

由于您的序列已接近排序,您一定会收到满意的结果(即整齐地跟随趋势线)。

有多种解决方案;其中之一在 Svetlin Nakov 和 Veselin Kolev 的免费书籍“Fundamentals of Computer Programming with C#”中有描述;该问题在第 257 页的练习 6 中提出;解决方案在第 260 页。

摘自本书:

Write a program, which finds the maximal sequence of increasing elements in an array arr[n]. It is not necessary the elements to be consecutively placed. E.g.: {9, 6, 2, 7, 4, 7, 6, 5, 8, 4} -> {2, 4, 6, 8}.

Solution:

We can solve the problem with two nested loops and one more array len[0…n-1]. In the array len[i] we can keep the length of the longest consecutively increasing sequence, which starts somewhere in the array (it does not matter where exactly) and ends with the element arr[i]. Therefore len[0]=1, len[x] is the maximal sum max(1 + len[prev]), where prev < x and arr[prev] < arr[x]. Following the definition, we can calculate len[0…n-1] with two nested loops: the outer loop will iterate through the array from left to right with the loop variable x. The inner loop will iterate through the array from the start to position x-1 and searches for the element prev with maximal value of len[prev], where arr[prev] < arr[x]. After the search, we initialize len[x] with 1 + the biggest found value of len[prev] or with 1, if such a value is not found.

The described algorithm finds the lengths of all maximal ascending sequences, which end at each of the elements. The biggest one of these values is the length of the longest increasing sequence. If we need to find the elements themselves, which compose that longest sequence, we can start from the element, where the sequence ends (at index x), we can print it and we can search for a previous element (prev). By definition prev < x and len[x] = 1 + len[prev] so we can find prev with a for-loop from 1 to x-1. After that we can repeat the same for x=prev. By finding and printing the previous element (prev) many times until it exists, we can find the elements, which compose the longest sequence in reversed order (from the last to the first).