从 Objective-C 中的数组中删除重复项的最快方法是什么

Question

正在准备面试。我正在尝试通过解决以下问题来练习：给定一个 NSNumbers 输入数组，其中一些数字是重复的，你如何创建另一个数组，它只包含原始数组中的唯一值。

我看到两种方法：

暴力破解：循环遍历数组中的每个元素，同时在一个元素处将其与唯一列表中的数字集进行比较，如果匹配，则不存储它，否则将其添加到唯一列表。 O(n^2) 最坏情况时间？
Hash-table的方法：有一个长度为N的hash-table。has-table的每个元素都是NSSet。使用散列函数将每个数字映射到 0,...N-1。如果存在于"mapped-index"对应的NSSet中，则不添加到"unique array"中。如果不是，则将其添加到 set 和 unique 数组中。

这是 O(N) 复杂度吗？

我查看了两种实现方法 2 的方法 A. NSMutable 大小为 N 的数组在开始时全部初始化为 [NSNull null] 对象。 B. NSMutableDictionary where key = hashed mapping integer

每种方法的代码如下。

我注意到一世。运行对于如下所示的长度为 403 的输入数组（0.055 毫秒对 .12 毫秒），2A（数组方法）的时间是 2B（Mutable字典方法）的一半。
二. 运行 1 的时间是 0.25ms 的 5 倍。如果没有任何重复，这种差异会更严重。

我的问题是：

有没有比2更好的算法？
算法2有更好的实现吗？
为什么字典方法比较慢？我如何使用 Instruments 分析为自己回答这个问题。即我如何知道使用 Instruments 每一步所花费的确切时间？

代码

哈希码函数

#define NUM_BUCKETS 127
#define RANDOMIZER 11
#define NUM_ITER 40000

int hashcode(int value)
{
    int retVal = (value*RANDOMIZER)%NUM_BUCKETS ;
    if(retVal<0)
    {
        retVal+=NUM_BUCKETS ;
    }
    return retVal ;
}

1.蛮力方法

    NSMutableArray *smooshedArr=[[NSMutableArray alloc] init] ;
    double startTime ;

    startTime=CFAbsoluteTimeGetCurrent() ;
    for(int iter=0;iter<=NUM_ITER;iter++)
    {
        [smooshedArr removeAllObjects] ;
        [smooshedArr addObject:ints[0]] ;

        int i,j ;
        for(i=1;i<[ints count];i++)
        {
            for(j=0;j<[smooshedArr count];j++)
            {
                if([ints[i] intValue] == [smooshedArr[j] intValue])
                {
                    break ;
                }
            }
            if(j==[smooshedArr count])
            {
                [smooshedArr addObject:ints[i]] ;
            }
        }
    }
    NSLog(@"Bruteforce took %.3fms to remove duplicates from array of length %lu",(CFAbsoluteTimeGetCurrent()-startTime)*1000/NUM_ITER,(unsigned long)[ints count]) ;
    NSLog(@"Smooshed arary is %@",smooshedArr) ;

2A。基于数组的哈希 table

    NSMutableArray *hashTable = [[NSMutableArray alloc] init] ;

    startTime=CFAbsoluteTimeGetCurrent() ;
    for(int iter=0;iter<=NUM_ITER;iter++)
    {
        [smooshedArr removeAllObjects];
        for (NSInteger i = 0; i < NUM_BUCKETS; ++i)
        {
            [hashTable addObject:[NSNull null]];
        }

        [smooshedArr addObject:ints[0]] ;

        int indexToInsert = hashcode([ints[0] intValue]) ;
        hashTable[indexToInsert]=[[NSMutableSet alloc] init] ;
        [hashTable[indexToInsert] addObject:ints[0]] ;

        int i ;
        for(i=1;i<[ints count];i++)
        {
            //Find hascode of element i
            //If the list at index = hashcode in hashCodeArary is empty, then create a NSMutableSet, set toInsert = True
            //If not empty, check if the element exists in the set. If yes, setToInsert=False. If no, setToInsert=True
            int indexToInsert = hashcode([ints[i] intValue]) ;
            BOOL toInsert=false ;

            if(hashTable[indexToInsert] == [NSNull null])
            {
                hashTable[indexToInsert]=[[NSMutableSet alloc] init] ;
                toInsert=true ;
            }
            else
            {
                if(![hashTable[indexToInsert] containsObject:ints[i]])
                    toInsert=true ;
            }
            if(toInsert)
            {
                [hashTable[indexToInsert] addObject:ints[i]] ;
                [smooshedArr addObject:ints[i]] ;
            }
        }
    }
    NSLog(@"MutableArray (no cheat) took %.3fms to remove duplicates from array of length %lu",(CFAbsoluteTimeGetCurrent()-startTime)*1000/NUM_ITER,(unsigned long)[ints count]) ;

2B。基于字典的哈希 table

    NSMutableDictionary *hashDict = [[NSMutableDictionary alloc] init] ;
    //NSLog(@"Start of hashcode approach %.6f", CFAbsoluteTimeGetCurrent()) ;
    startTime=CFAbsoluteTimeGetCurrent() ;
    for(int iter=0;iter<=NUM_ITER;iter++)
    {
        //if(iter <4) NSLog(@"iter start: %.6f", CFAbsoluteTimeGetCurrent()) ;

        //if(iter <4) NSLog(@"init start: %.6f", CFAbsoluteTimeGetCurrent()) ;
          [smooshedArr removeAllObjects];
          [hashDict removeAllObjects] ;
        //if (iter<4) NSLog(@"init end: %.6f", CFAbsoluteTimeGetCurrent()) ;


        [smooshedArr addObject:ints[0]] ;

        int indexToInsert = hashcode([ints[0] intValue]) ;
        hashDict[@(indexToInsert)]=[[NSMutableSet alloc] init] ;
        [hashDict[@(indexToInsert)] addObject:ints[0]] ;

        int i ;
        for(i=1;i<[ints count];i++)
        {
            //Find hascode of element i
            //If the list at index = hashcode in hashCodeArary is empty, then create a NSMutableSet, set toInsert = True
            //If not empty, check if the element exists in the set. If yes, setToInsert=False. If no, setToInsert=True
            int indexToInsert = hashcode([ints[i] intValue]) ;
            BOOL toInsert=false ;

            if(hashDict[@(indexToInsert)] == nil)
            {
                hashDict[@(indexToInsert)]=[[NSMutableSet alloc] init] ;
                toInsert=true ;
            }
            else
            {
                if(![hashDict[@(indexToInsert)] containsObject:ints[i]])
                    toInsert=true ;
            }
            if(toInsert)
            {
                [hashDict[@(indexToInsert)] addObject:ints[i]] ;
                [smooshedArr addObject:ints[i]] ;
            }
        }
    }
    NSLog(@"Dictionary approach: %.3fms to remove duplicates from array of length %lu",(CFAbsoluteTimeGetCurrent()-startTime)*1000/NUM_ITER,(unsigned long)[ints count]) ;

输入测试开启，430 个元素有一些重复，平均超过 40000 次迭代

   NSArray *ints = @[@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(2727272),@(112),@(3),@(4),@(1),@(612211),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(7272),@(1232),@(3),@(4),@(1),@(60),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(2727272),@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(2727272),@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(72),@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(13272),@(2),@(3),@(4),@(18),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(972),@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(3272),@(2),@(3),@(4),@(1),@(69),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(1272),@(2),@(3),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(2272),@(2),@(3),@(4),@(1),@(6),@(91),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(7272),@(2),@(3),@(4),@(12),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(111),@(27272),@(2),@(321),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(4545411),@(12341),@(34210),@(123),@(1234),@(1111),@(727272),@(11187),@(9086),@(876543),@(74532),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(13272),@(2),@(3),@(4),@(18),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(658),@(45454),@(12934),@(38421),@(1243),@(12345),@(1112),@(72),@(52),@(3),@(498),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(45454),@(1234),@(650),@(45454),@(1234),@(3421),@(123),@(1234),@(111),@(27272),@(2),@(321),@(4),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(65),@(4545411),@(12341),@(34210),@(123),@(1234),@(1111),@(727272),@(11187),@(9086),@(876543),@(74532),@(464642),@(65),@(45454),@(1234),@(3421),@(123),@(1234),@(11111),@(13272),@(2),@(3),@(4),@(18),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(45454),@(464642),@(658),@(45454),@(12934),@(38421),@(1243),@(19992345),@(119875412),@(72),@(52),@(3),@(498),@(1),@(6),@(9),@(2),@(2),@(3),@(21),@(22),@(450454),@(46908764642),@(6753435),@(45498754),@(100234),@(65)] ;

Answer 1

如果您正在准备面试，我建议您使用已经实现的框架 classes。不要重新实现轮子。尝试从上到下解决问题。不考虑细节（哈希函数），考虑算法结构：

在伪代码中：

for number in input {
   if number appears for the first time {
      add number to output
   }
}

我们唯一的问题是如何实施number appears for the first time。这是唯一对性能有影响的一点。

在Objective-C中我们可以使用NSSet，这是一个class专门为这个问题创建的。

NSArray *input = @[... array of numbers];

NSMutableSet *foundNumbers = [NSMutableSet set];
NSMutableArray *output = [NSMutableArray array];

for (NSNumber *number in input) {
    if (![foundNumbers containsObject:number])) {
       [foundNumbers addObject:number];
       [output addObject:number];
    }
}

NSLog(@"Output: %@", output);

您只需要传递一次输入数组。提高性能的唯一方法是使用与 NSSet 不同的结构，但是 NSSet 已经高度优化，您不太可能找到更好的选择。

如果您想跳出框框思考并且输入的数字被限制在足够小的范围内（例如 0...65000），您可以创建一个包含 65000 个项目的 BOOL 数组，全部初始化为 NO 并将其用作快速设置实现。但是，这会占用大量内存，除非 input 数组很长，否则不会有任何效果。

绝对不要实现自己的哈希 tables，NSDictionary 已经是哈希 table。您在第二个实现中所做的只是对 NSDictionary 的非常模糊的重新实现。只有当您可以将它们保存为一个简单的数组时，桶才会起作用。一旦你向它添加哈希函数，你就会失去性能增益。

另请注意，代码的整体质量对于面试非常重要。不要使用 #define 声明常量。保持良好的编码风格（我强烈建议在运算符周围使用空格）。使用迭代器而不是 for(;;) 尝试为变量命名比 hashDict 更好（根据变量包含的数据命名变量）。

现在有个小秘密，还有一个class NSOrderedSet把NSArray和NSSet合二为一，可以更轻松的解决你的问题：

NSOrderedSet *orderedSet = [NSOrderedSet orderedSetWithArray:ints];
NSLog(@"Output: %@", orderedSet);

Answer 2

实际上甚至没有必要使用 NSOrderedSet——只要使用 NSSet 就可以逃脱：

NSSet *set = [NSSet setWithArray:ints];

如果您需要一个数组作为输出，键值编码可以提供帮助：

NSArray *array =  [ints valueForKeyPath:@"@distinctUnionOfObjects.self"];

Answer 3

如果您不想使用额外的 space(hash)，如果数组中的数字序列无关紧要，但您仍然不想像蛮力一样慢，那么您可以排序数组，然后一次删除重复项。时间复杂度 nlog(n) + n

从 Objective-C 中的数组中删除重复项的最快方法是什么

What's the fastest way to remove duplicates from an array in Objective-C

algorithm

objective-c

memset

nsmutablearray

array-algorithms