排序插入到具有重复项的固定大小数组中

Question

我试图找到最有效的 C 程序来存储传入数据流中的 N 个最大值。例如。假设每个传入数据为 32 字节，并且是来自传感器的连续流，我需要存储流中的 N 个最大值（允许重复）。简单的方法是迭代并找到位置，然后将下面的所有元素移动一个（可能丢弃当前最小值）。有更好的方法吗？

Source

//MAX_KEEP    32


typedef struct accel_sys
{
    FILE *infile;

    /* Data for largest and last */
    u32 largest[MAX_KEEP]; /* largest in highest index, smallest in lowest index */
    u32 last[MAX_KEEP]; /* circular buffer */
    u8 last_start; /* points to the oldest value */

    /* Data for reading and processing the file */
    u8 last_byte;
    Bool even;
    int num_read;

} accel_t;

typedef accel_t * accel_h;
static void store_max(accel_h accel, u32 cur_value)
{
    int i = MAX_KEEP-1;
    int j = 0;

    while(i >= 0)
    {
        if( cur_value > accel->largest[i] )
        {
            /* found it */
            
            break;
        }
        i--;
    }

    /* i < 0 if the value doesn't belong in the array, do nothing in that case */
    if( i >= 0 )
    {
        /* Move everything lower than cur_value down, losing the last value,
         * then store our new value in our found spot */
        j = 0;
        while( j < i )
        {
            accel->largest[j] = accel->largest[j+1];
            j++;
        }
        accel->largest[i] = cur_value;
    }

    
}

Answer 1

第一个优化是用 memmove 替换用于移动数组的显式循环。当然无论哪种方式都是线性时间，但在大多数平台上，memmove 是线性的，常数乘数更快。

接下来，N有多大？因为您显然已经按排序顺序保持值，所以，为什么不进行平分搜索而不是线性搜索？这意味着您的摊销平均时间变为 O(log N) 而不是 O(N)。^*

所以（未经测试；我保证某处至少有一个差一错误……）：

static void store_max(accel_h accel, uint16_t cur_value) {
    size_t first = 0, last = N, middle;
    while (first < last) {
        middle = (first + last)/2;
        if (accel->largest[middle] < cur_value)
            first = middle + 1;
        else if (accel->largest[middle] == cur_value)
            break;
        else
            last = middle - 1;
    }
    if (middle > 0) {
        memmove(accel->largest, accel->largest+1, middle);
        accel->largest[middle] = cur_value;
    }
}

如果你想改善最坏情况时间，你需要一个堆，因为你可以在对数时间内推入弹出。^**你可以将堆存储在N 值的普通旧数组就像您的排序数组一样，并在线性时间内按排序顺序读出这些值。但这增加了一些复杂性，我不想尝试在我的 phone 上编写代码。 :)

_{* 你最坏的情况还是O(N)；想象一个病理情况，其中值不断增加。但即使在那种情况下，非常快的 O(N) + 慢速 O(log N) 也可能比非常快的 O(N) + 慢速 O(N) 值得改进。}

_{** 尽管在实践中，对于您可能关心的 N 的值，O(log N) 交换可能比 memmove 慢……}

排序插入到具有重复项的固定大小数组中

sorted insert into a fixed sized array with duplication

c

sorting

embedded

algorithm

performance