排列排序列表中的元素,使所有冗余元素都在列表的 left/right 一侧
arrange the elements in the sorted list so that all the redundant elements are at left/right side of the list
有没有比这个更好的方法来解决这个问题?
def 渲染(arr):
single_elements = []
double_elements = []
for i in xrange(len(arr)-1):
if arr[i] == arr[i+1] or arr[i] == arr[i-1]:
double_elements.append(arr[i])
else:
single_elements.append(arr[i])
if arr[-1] == double_elements[-1]:
double_elements.append(arr[-1])
else:
single_elements.append(arr[-1])
return single_elements+double_elements
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
'''output arr = [1,2,4,5,6,9,3,3,3,7,7,7,7,8,8,8]'''
打印渲染(arr)
这个怎么样?让我知道是否需要进一步解释。
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
repeatedElements = [i for i in arr if arr.count(i) > 1]
newList = [i for i in arr if i not in repeatedElements] + repeatedElements
print newList
>>> [1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
你不能在一行中很好地完成它,但我认为最好保留它 O(N)
>>> from itertools import groupby
>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for k, g in groupby(arr):
group = list(g)
(a if len(group)<2 else b).extend(group)
>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
你的方法是最有效的,你可以通过使用枚举而不是重复索引进行一些更改来提高效率,使用 python 2 和 25% 可以减少大约 15% 的运行时间使用 python 3:
single_elements = []
double_elements = []
for i, ele in enumerate(arr[:-1], 1):
if ele == arr[i] or ele == arr[i-2]:
double_elements.append(ele)
else:
single_elements.append(ele)
ele = arr[-1]
if ele == double_elements[-1]:
double_elements.append(ele)
else:
single_elements.append(ele)
single_elements.extend(double_elements)
或者如果你想要更少的行:
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i-2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
一些时间和覆盖一个元素的数组和一个空数组:
def sort_dups(arr):
if len(arr) < 2:
return arr
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
return sin_ele
In [38]: timeit sort_dups(arr)
100000 loops, best of 3: 4.69 µs per loop
In [39]: timeit f(arr)
100000 loops, best of 3: 8.05 µs per loop
In [40]: %%timeit
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements ....:
100000 loops, best of 3: 5.38 µs per loop
空列表和单个元素列表:
In [74]: sort_dups([1, 2, 2, 3, 5, 5, 5])
Out[74]: [1, 3, 2, 2, 5, 5, 5]
In [75]: sort_dups([1, 1, 1, 1, 2])
Out[75]: [2, 1, 1, 1, 1]
In [76]: sort_dups([])
Out[76]: []
In [77]: sort_dups([0])
Out[77]: [0]
在稍大的输入上:
In [59]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [60]: timeit f(arr)
100000 loops, best of 3: 14.2 µs per loop
In [61]: timeit sort_dups(arr)
100000 loops, best of 3: 7.81 µs per loop
In [71]: arr+= [None]
In [72]: %%timeit
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
....:
100000 loops, best of 3: 10.1 µs per loop
In [93]: %%timeit
a, b = [], []
>>> for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
....:
10000 loops, best of 3: 14 µs per loop
In [110]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [111]: timeit reorderSequence(arr)
100000 loops, best of 3: 7.85 µs per loop
In [112]: timeit sort_dups(arr)
100000 loops, best of 3: 4.78 µs per loop
In [110]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [119]: timeit cython_sort_dups(arr)
1000000 loops, best of 3: 1.38 µs per loop
好的,@jamylak。我养你
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
arr += [None]
repeatedElements = []
print [num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
我用 time.clock() 一直测试到 10*、100*、1000* arr。至少在我的机器上,它在非常高的数字下比你的快一点(否则在较小的数量下明显更快)。
事实上,它的输出速度也比 OP 的快。比赛,比赛,比赛。
>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
@Eithos 我又养你了
一些测试:
def f(arr):
a, b = [], []
for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
return a + b
>>> f([1, 2, 2, 3, 5, 5, 5])
[1, 3, 2, 2, 5, 5, 5]
>>> f([1, 1, 1, 1, 2])
[2, 1, 1, 1, 1]
>>> f([])
[]
>>> f([0])
[0]
>>> f([9, 10, 10, 11, 12, 12, 13, 14, 15, 15])
[9, 11, 13, 14, 10, 10, 12, 12, 15, 15]
(好吧,我知道这很不寻常:三个不同的答案。但对我来说......这似乎是有道理的。)
编辑(1):添加版本 A2(增加优化)>>> 请参阅下面的测试
编辑(2):添加版本A3(极度优化)>>>见下方测试
就在你们所有人都认为这一切都结束了……我要说的是:好吧,这不是.
我提出 你们所有人(@jamylak、@Padraic Cunningham、@Prashant Kumar)。我挑战任何人想出一个更快的算法。如果发生这种情况,我会欣然认输,继续我的生活。到那时...
算法
我意识到我太执着于制作完美的、最少的代码行(部分原因是 jamylak,他的最后一个算法真的......让我惊叹。我从来没有想过以这种方式使用三元运算符。所以,太棒了。)。
最初,我想出了第二个答案的修改版本,因为我想用最少的努力快速超越 jamylak,这似乎是实现目标的方法。但是,它真的变得有点古怪和不清楚,所以它并不理想。您不希望同事必须了解开始看起来像下面的算法出了什么问题..
版本 1
#...This
def version1Func(arr):
arr += [None]
repeatedElements = []
return [num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
版本 2
# ...became this
def version2Func(arr):
arr += [None]
repeatedSet = set([])
repeatedElements = []
return [num for (i, num) in enumerate(arr[:-1]) if (
repeatedElements.append(num) if num in repeatedSet else (
repeatedSet.add(num) or repeatedElements.append(num)) if (
num in (arr[i+1], arr[i-1])) else True
)] + repeatedElements
# See? This becomes difficult to understand for all but those intimately
# familiar with the abuse and hacks that are employed here. Still, it's fairly
# effective and, hopefully, if ever used, shouldn't cause any bugs afaik.
两者都非常快。
在测试期间(使用版本 2.6.4, 而不是 2.7 正如我之前所说。我的错误),第一个更快比 jamylak 和 Padraic 的数据集都小。随着数据的增加,这种差异越来越小。差异非常小,以至于 Python 3(或其他混杂因素)可能会给另一种算法带来优势(正如 Padraic 的测试所示,他的速度要快一些)。
对于非常小的数据集,例如 <=50 个元素,第二个版本稍微慢一些。随着它的增加,差异变得相当明显,因为它真的开始比较闪耀 (*1)。在某种程度上,它已经是这里最快算法的一个很好的候选者,因为通常我们在决定时间复杂度是一个值得解决的问题时更倾向于关注大型数据集。
但是……继续前进;以下 algorithm/s 是我做过的最快的,在最好的情况下产生的速度几乎是 Version 1
.
的两倍
(*1) : 后来我注意到,当 Padraic 算法放在函数中时,情况就不再是这样了。它变得更快了。现在,帕德莱克和 Version 2
似乎又一次旗鼓相当了。
版本 A1
def reorderSequence(seq):
seqLength = len(seq)
seqSetLength = len(set(seq))
if seqSetLength != seqLength and seqLength >= 3:
indexLength = seqLength - 1
index = 0
newList = []
repeatedList = []
repeatedNum = 0
currentItem = 0
while True:
if index >= indexLength:
lastItem = seq[indexLength]
if lastItem != repeatedList[-1]:
newList.append(lastItem)
return newList + repeatedList
baseIndex = index
baseNum = seq[index]
while True:
# Checks if the next number in the list is the same and
# keeps resetting the while loop (with `continue`) until
# this condition is no longer True.
nextItem = seq[index+1]
if baseNum == nextItem:
repeatedNum = nextItem
index+=1
if index < indexLength:
continue
else:
index+=1
break
# If the previous condition failed, this `if block` will
# confirm that the current number is a repeat of the last
# one and set the baseNum to the next number; it will repeat
# the while loop (with `continue`) because of the possibility
# that with the next number begins a new series of redundant
# elements, thereby keeping the collection growing before
# finally adding it to the 'repeatedList'. But if the next
# number isn't the beginning of a new series...
currentItem = seq[index]
if currentItem == repeatedNum:
baseNum = nextItem
index+=1
if index < indexLength:
continue
else:
break
else:
# .. it will append it to this newList, break
# to the outer-While...
newList.append(currentItem)
break
# ...and, at this point, it will slice the sequence according
# to the outer-While's baseIndex and inner-While's updated index
# and extend the repeatedList.
if baseIndex != index:
repeatedList.extend(seq[baseIndex:index])
index+=1
else:
return seq
版本 A2 - 编辑(1) >>> 优化
def reorderSequence(seq):
seqLength = len(seq)
if seqLength >= 3:
indexLength = seqLength - 1
index = 0
baseIndex = index
newList = []
repeatedList = []
baseNum = seq[index]
nextNum = seq[index+1]
repeated = True if baseNum == nextNum else False
while True:
if index >= indexLength:
return newList + repeatedList
while repeated:
if baseNum == nextNum:
index+=1
if index < indexLength:
nextNum = seq[index+1]
continue
index+=1
if index < indexLength:
baseNum = nextNum
nextNum = seq[index+1]
if baseNum == nextNum:
continue
else:
repeated = False
else:
if baseNum != nextNum:
repeated = False
repeatedList.extend(seq[baseIndex:index])
baseIndex = index
break
while not repeated:
if baseNum != nextNum:
baseNum = nextNum
index+=1
if index < indexLength:
nextNum = seq[index+1]
continue
else:
index+=1
else:
repeated = True
newList.extend(seq[baseIndex:index])
baseIndex = index
break
else:
return seq
版本 A3 - 编辑(2) >>> 极度优化!
def reorderSequence(seq):
sliceIndex = baseIndex = index = 0
newList = []
baseNum = seq[index]
nextNum = seq[index+1]
repeated = True if baseNum == nextNum else False
try:
while True:
while repeated:
if baseNum == nextNum:
index+=1
nextNum = seq[index+1]
continue
index+=1
baseNum = nextNum
nextNum = seq[index+1]
if baseNum == nextNum:
continue
else:
repeated = False
newList.extend(seq[baseIndex:index])
baseIndex = index
break
while not repeated:
if baseNum != nextNum:
baseNum = nextNum
index+=1
nextNum = seq[index+1]
continue
else:
repeated = True
newList[sliceIndex:sliceIndex] = seq[baseIndex:index]
sliceIndex += index - baseIndex
baseIndex = index
break
except IndexError:
if repeated:
if seq[-1] == seq[-2]:
newList.extend(seq[baseIndex:index+1])
if seq[-1] != seq[-2]:
newList[sliceIndex] = seq[-1]
newList.extend(seq[baseIndex:index])
if not repeated:
newList[sliceIndex:sliceIndex] = seq[baseIndex:index+1]
return newList
很明显,在这一点上我不再关心要使代码优雅、简短等。优雅很有趣,但有时为了获得最大的果汁,必须牺牲优雅。而且,考虑到 OP 指出他关心的是效率,更少的行数或没有行数不应该算在内。
其他答案(供参考)
# jamylak's:
def f(arr):
a, b = [], []
for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
return a + b
# Padraic's:
def sort_dups(arr):
if len(arr) < 2:
return arr
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
return sin_ele
结果(测试 1)
# Using this as argument..
arr = sorted([1,2,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20] * x)
# In this case, x will be 10000 or 100
# ..and time.clock() as timer:
# jamylak's:
>>> 0.134921406994 +- 0.001 # 10000*
>>> 0.00127404297442 # 100*
# Padraic's:
>>> 0.0626158414828 +- 0.001 # 10000*
>>> 0.000532060143703 # 100*
# Mine - Version 1
>>> 0.0728380523271 +- 0.002 # 10000*
>>> 0.000671155257454 # 100*
# Mine - Version 2
>>> 0.0612159302306 +- 0.001 # 10000*
>>> 0.000565767241821 # 100*
# Mine - Version A1
>>> 0.0519618384449 +- 0.001 # 10000*
>>> 0.000506459816019 # 100*
结果(测试 2)- 编辑(1)
# Using the following argument
arr = sorted([1,2,41,52,6,57,7,7,71,8,82,83,9,1244,132,1221,14,15,15,1523,19,20] * 10000
+ [1,2,2,4,5,42,23,7,1,55,21,23,34,24,26,27,6,31,32,33,61,62,70])
# Padraic's:
0.0614181728193
# Mine - Version A2
0.0403025958732
结果(测试 3)- 编辑(2)
# Using same argument as Test 2
# Mine - Version A3
0.0338009659857
在做了这些测试之后,我意识到了一件非常重要的事情。我相信我已经了解了一段时间,但忘记了。
所有这些在函数内部使用时都得到了显着的速度提升。看来比赛场地已经相当公平了。事实上,我会说它已经被夷为平地了。在发布这篇文章之前,我仍在函数中 运行ning 我的 Version A1
算法,并将其与 Padraic 的全局声明的旧算法进行比较,所以我认为我有更大的优势,直到我开始重新测试每个算法.根据我的数据,我仍然领先,虽然没有我想象的那么多。
关于函数内部的速度提升与它与我的 list comps 之间的差距如何缩小:我想这可能与 list comps 如何非常有效地自行优化有关,而算法在本地声明(在函数内部)执行类似的优化。
正如 Lambeau 在 'Good Will Hunting' 中所说:
"So, let this be said: the gauntlet has been thrown down, but [I] have answered, and answered, with vigor."
那么,@Prashant Kumar,你怎么看?这是更计算效率的答案吗? :)
注意: 当然,我欢迎任何人(Padraic?)执行他们自己的测试,看看 运行s 在他们自己的机器上如何,Python版本等
如果在结果中注明python版本就好了。它可以帮助阐明速度差异。
编辑 1:
我的结论是 Version A2
可能是最快的 运行。总有进行细微调整的空间,但我怀疑它是否会得到任何显着的提升。我确实有另一个版本,使用此处提供的庞大数据集作为参数,可以再增加 0.002 秒。不幸的是,它在开始时稍微(并且,我的意思是,非常轻微)慢。因此,当与大型参数一起使用时,用较小的数据集来换取速度上的轻微损失似乎不值得。
致 OP: 我建议您尝试其中的一些算法,以便您做出判断。我也很好奇是否有人可以重现我得到的一些结果与@Padraic 的结果。关于这一点,我也很想知道这是否可能与 time.clock 和 timeit?
之间的差异有关
编辑 2:
好的。我想我已经用尽了这里的所有优化途径。而且,它只使用一个列表而不是两个,这也更接近 OP 的目标(he/she 有兴趣看看是否这可以在不创建额外的空列表的情况下完成)
有没有比这个更好的方法来解决这个问题?
def 渲染(arr):
single_elements = []
double_elements = []
for i in xrange(len(arr)-1):
if arr[i] == arr[i+1] or arr[i] == arr[i-1]:
double_elements.append(arr[i])
else:
single_elements.append(arr[i])
if arr[-1] == double_elements[-1]:
double_elements.append(arr[-1])
else:
single_elements.append(arr[-1])
return single_elements+double_elements
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
'''output arr = [1,2,4,5,6,9,3,3,3,7,7,7,7,8,8,8]'''
打印渲染(arr)
这个怎么样?让我知道是否需要进一步解释。
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
repeatedElements = [i for i in arr if arr.count(i) > 1]
newList = [i for i in arr if i not in repeatedElements] + repeatedElements
print newList
>>> [1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
你不能在一行中很好地完成它,但我认为最好保留它 O(N)
>>> from itertools import groupby
>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for k, g in groupby(arr):
group = list(g)
(a if len(group)<2 else b).extend(group)
>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
你的方法是最有效的,你可以通过使用枚举而不是重复索引进行一些更改来提高效率,使用 python 2 和 25% 可以减少大约 15% 的运行时间使用 python 3:
single_elements = []
double_elements = []
for i, ele in enumerate(arr[:-1], 1):
if ele == arr[i] or ele == arr[i-2]:
double_elements.append(ele)
else:
single_elements.append(ele)
ele = arr[-1]
if ele == double_elements[-1]:
double_elements.append(ele)
else:
single_elements.append(ele)
single_elements.extend(double_elements)
或者如果你想要更少的行:
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i-2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
一些时间和覆盖一个元素的数组和一个空数组:
def sort_dups(arr):
if len(arr) < 2:
return arr
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
return sin_ele
In [38]: timeit sort_dups(arr)
100000 loops, best of 3: 4.69 µs per loop
In [39]: timeit f(arr)
100000 loops, best of 3: 8.05 µs per loop
In [40]: %%timeit
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements ....:
100000 loops, best of 3: 5.38 µs per loop
空列表和单个元素列表:
In [74]: sort_dups([1, 2, 2, 3, 5, 5, 5])
Out[74]: [1, 3, 2, 2, 5, 5, 5]
In [75]: sort_dups([1, 1, 1, 1, 2])
Out[75]: [2, 1, 1, 1, 1]
In [76]: sort_dups([])
Out[76]: []
In [77]: sort_dups([0])
Out[77]: [0]
在稍大的输入上:
In [59]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [60]: timeit f(arr)
100000 loops, best of 3: 14.2 µs per loop
In [61]: timeit sort_dups(arr)
100000 loops, best of 3: 7.81 µs per loop
In [71]: arr+= [None]
In [72]: %%timeit
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
....:
100000 loops, best of 3: 10.1 µs per loop
In [93]: %%timeit
a, b = [], []
>>> for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
....:
10000 loops, best of 3: 14 µs per loop
In [110]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [111]: timeit reorderSequence(arr)
100000 loops, best of 3: 7.85 µs per loop
In [112]: timeit sort_dups(arr)
100000 loops, best of 3: 4.78 µs per loop
In [110]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]
In [119]: timeit cython_sort_dups(arr)
1000000 loops, best of 3: 1.38 µs per loop
好的,@jamylak。我养你
arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
arr += [None]
repeatedElements = []
print [num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
我用 time.clock() 一直测试到 10*、100*、1000* arr。至少在我的机器上,它在非常高的数字下比你的快一点(否则在较小的数量下明显更快)。
事实上,它的输出速度也比 OP 的快。比赛,比赛,比赛。
>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]
@Eithos 我又养你了
一些测试:
def f(arr):
a, b = [], []
for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
return a + b
>>> f([1, 2, 2, 3, 5, 5, 5])
[1, 3, 2, 2, 5, 5, 5]
>>> f([1, 1, 1, 1, 2])
[2, 1, 1, 1, 1]
>>> f([])
[]
>>> f([0])
[0]
>>> f([9, 10, 10, 11, 12, 12, 13, 14, 15, 15])
[9, 11, 13, 14, 10, 10, 12, 12, 15, 15]
(好吧,我知道这很不寻常:三个不同的答案。但对我来说......这似乎是有道理的。)
编辑(1):添加版本 A2(增加优化)>>> 请参阅下面的测试
编辑(2):添加版本A3(极度优化)>>>见下方测试
就在你们所有人都认为这一切都结束了……我要说的是:好吧,这不是.
我提出 你们所有人(@jamylak、@Padraic Cunningham、@Prashant Kumar)。我挑战任何人想出一个更快的算法。如果发生这种情况,我会欣然认输,继续我的生活。到那时...
算法
我意识到我太执着于制作完美的、最少的代码行(部分原因是 jamylak,他的最后一个算法真的......让我惊叹。我从来没有想过以这种方式使用三元运算符。所以,太棒了。)。
最初,我想出了第二个答案的修改版本,因为我想用最少的努力快速超越 jamylak,这似乎是实现目标的方法。但是,它真的变得有点古怪和不清楚,所以它并不理想。您不希望同事必须了解开始看起来像下面的算法出了什么问题..
版本 1
#...This
def version1Func(arr):
arr += [None]
repeatedElements = []
return [num for (i, num) in enumerate(arr[:-1]) if not
(arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
repeatedElements.append(num)] + repeatedElements
版本 2
# ...became this
def version2Func(arr):
arr += [None]
repeatedSet = set([])
repeatedElements = []
return [num for (i, num) in enumerate(arr[:-1]) if (
repeatedElements.append(num) if num in repeatedSet else (
repeatedSet.add(num) or repeatedElements.append(num)) if (
num in (arr[i+1], arr[i-1])) else True
)] + repeatedElements
# See? This becomes difficult to understand for all but those intimately
# familiar with the abuse and hacks that are employed here. Still, it's fairly
# effective and, hopefully, if ever used, shouldn't cause any bugs afaik.
两者都非常快。
在测试期间(使用版本 2.6.4, 而不是 2.7 正如我之前所说。我的错误),第一个更快比 jamylak 和 Padraic 的数据集都小。随着数据的增加,这种差异越来越小。差异非常小,以至于 Python 3(或其他混杂因素)可能会给另一种算法带来优势(正如 Padraic 的测试所示,他的速度要快一些)。
对于非常小的数据集,例如 <=50 个元素,第二个版本稍微慢一些。随着它的增加,差异变得相当明显,因为它真的开始比较闪耀 (*1)。在某种程度上,它已经是这里最快算法的一个很好的候选者,因为通常我们在决定时间复杂度是一个值得解决的问题时更倾向于关注大型数据集。
但是……继续前进;以下 algorithm/s 是我做过的最快的,在最好的情况下产生的速度几乎是 Version 1
.
(*1) : 后来我注意到,当 Padraic 算法放在函数中时,情况就不再是这样了。它变得更快了。现在,帕德莱克和 Version 2
似乎又一次旗鼓相当了。
版本 A1
def reorderSequence(seq):
seqLength = len(seq)
seqSetLength = len(set(seq))
if seqSetLength != seqLength and seqLength >= 3:
indexLength = seqLength - 1
index = 0
newList = []
repeatedList = []
repeatedNum = 0
currentItem = 0
while True:
if index >= indexLength:
lastItem = seq[indexLength]
if lastItem != repeatedList[-1]:
newList.append(lastItem)
return newList + repeatedList
baseIndex = index
baseNum = seq[index]
while True:
# Checks if the next number in the list is the same and
# keeps resetting the while loop (with `continue`) until
# this condition is no longer True.
nextItem = seq[index+1]
if baseNum == nextItem:
repeatedNum = nextItem
index+=1
if index < indexLength:
continue
else:
index+=1
break
# If the previous condition failed, this `if block` will
# confirm that the current number is a repeat of the last
# one and set the baseNum to the next number; it will repeat
# the while loop (with `continue`) because of the possibility
# that with the next number begins a new series of redundant
# elements, thereby keeping the collection growing before
# finally adding it to the 'repeatedList'. But if the next
# number isn't the beginning of a new series...
currentItem = seq[index]
if currentItem == repeatedNum:
baseNum = nextItem
index+=1
if index < indexLength:
continue
else:
break
else:
# .. it will append it to this newList, break
# to the outer-While...
newList.append(currentItem)
break
# ...and, at this point, it will slice the sequence according
# to the outer-While's baseIndex and inner-While's updated index
# and extend the repeatedList.
if baseIndex != index:
repeatedList.extend(seq[baseIndex:index])
index+=1
else:
return seq
版本 A2 - 编辑(1) >>> 优化
def reorderSequence(seq):
seqLength = len(seq)
if seqLength >= 3:
indexLength = seqLength - 1
index = 0
baseIndex = index
newList = []
repeatedList = []
baseNum = seq[index]
nextNum = seq[index+1]
repeated = True if baseNum == nextNum else False
while True:
if index >= indexLength:
return newList + repeatedList
while repeated:
if baseNum == nextNum:
index+=1
if index < indexLength:
nextNum = seq[index+1]
continue
index+=1
if index < indexLength:
baseNum = nextNum
nextNum = seq[index+1]
if baseNum == nextNum:
continue
else:
repeated = False
else:
if baseNum != nextNum:
repeated = False
repeatedList.extend(seq[baseIndex:index])
baseIndex = index
break
while not repeated:
if baseNum != nextNum:
baseNum = nextNum
index+=1
if index < indexLength:
nextNum = seq[index+1]
continue
else:
index+=1
else:
repeated = True
newList.extend(seq[baseIndex:index])
baseIndex = index
break
else:
return seq
版本 A3 - 编辑(2) >>> 极度优化!
def reorderSequence(seq):
sliceIndex = baseIndex = index = 0
newList = []
baseNum = seq[index]
nextNum = seq[index+1]
repeated = True if baseNum == nextNum else False
try:
while True:
while repeated:
if baseNum == nextNum:
index+=1
nextNum = seq[index+1]
continue
index+=1
baseNum = nextNum
nextNum = seq[index+1]
if baseNum == nextNum:
continue
else:
repeated = False
newList.extend(seq[baseIndex:index])
baseIndex = index
break
while not repeated:
if baseNum != nextNum:
baseNum = nextNum
index+=1
nextNum = seq[index+1]
continue
else:
repeated = True
newList[sliceIndex:sliceIndex] = seq[baseIndex:index]
sliceIndex += index - baseIndex
baseIndex = index
break
except IndexError:
if repeated:
if seq[-1] == seq[-2]:
newList.extend(seq[baseIndex:index+1])
if seq[-1] != seq[-2]:
newList[sliceIndex] = seq[-1]
newList.extend(seq[baseIndex:index])
if not repeated:
newList[sliceIndex:sliceIndex] = seq[baseIndex:index+1]
return newList
很明显,在这一点上我不再关心要使代码优雅、简短等。优雅很有趣,但有时为了获得最大的果汁,必须牺牲优雅。而且,考虑到 OP 指出他关心的是效率,更少的行数或没有行数不应该算在内。
其他答案(供参考)
# jamylak's:
def f(arr):
a, b = [], []
for i, x in enumerate(arr):
(b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
return a + b
# Padraic's:
def sort_dups(arr):
if len(arr) < 2:
return arr
sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)
return sin_ele
结果(测试 1)
# Using this as argument..
arr = sorted([1,2,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20] * x)
# In this case, x will be 10000 or 100
# ..and time.clock() as timer:
# jamylak's:
>>> 0.134921406994 +- 0.001 # 10000*
>>> 0.00127404297442 # 100*
# Padraic's:
>>> 0.0626158414828 +- 0.001 # 10000*
>>> 0.000532060143703 # 100*
# Mine - Version 1
>>> 0.0728380523271 +- 0.002 # 10000*
>>> 0.000671155257454 # 100*
# Mine - Version 2
>>> 0.0612159302306 +- 0.001 # 10000*
>>> 0.000565767241821 # 100*
# Mine - Version A1
>>> 0.0519618384449 +- 0.001 # 10000*
>>> 0.000506459816019 # 100*
结果(测试 2)- 编辑(1)
# Using the following argument
arr = sorted([1,2,41,52,6,57,7,7,71,8,82,83,9,1244,132,1221,14,15,15,1523,19,20] * 10000
+ [1,2,2,4,5,42,23,7,1,55,21,23,34,24,26,27,6,31,32,33,61,62,70])
# Padraic's:
0.0614181728193
# Mine - Version A2
0.0403025958732
结果(测试 3)- 编辑(2)
# Using same argument as Test 2
# Mine - Version A3
0.0338009659857
在做了这些测试之后,我意识到了一件非常重要的事情。我相信我已经了解了一段时间,但忘记了。
所有这些在函数内部使用时都得到了显着的速度提升。看来比赛场地已经相当公平了。事实上,我会说它已经被夷为平地了。在发布这篇文章之前,我仍在函数中 运行ning 我的 Version A1
算法,并将其与 Padraic 的全局声明的旧算法进行比较,所以我认为我有更大的优势,直到我开始重新测试每个算法.根据我的数据,我仍然领先,虽然没有我想象的那么多。
关于函数内部的速度提升与它与我的 list comps 之间的差距如何缩小:我想这可能与 list comps 如何非常有效地自行优化有关,而算法在本地声明(在函数内部)执行类似的优化。
正如 Lambeau 在 'Good Will Hunting' 中所说: "So, let this be said: the gauntlet has been thrown down, but [I] have answered, and answered, with vigor."
那么,@Prashant Kumar,你怎么看?这是更计算效率的答案吗? :)
注意: 当然,我欢迎任何人(Padraic?)执行他们自己的测试,看看 运行s 在他们自己的机器上如何,Python版本等
如果在结果中注明python版本就好了。它可以帮助阐明速度差异。
编辑 1:
我的结论是 Version A2
可能是最快的 运行。总有进行细微调整的空间,但我怀疑它是否会得到任何显着的提升。我确实有另一个版本,使用此处提供的庞大数据集作为参数,可以再增加 0.002 秒。不幸的是,它在开始时稍微(并且,我的意思是,非常轻微)慢。因此,当与大型参数一起使用时,用较小的数据集来换取速度上的轻微损失似乎不值得。
致 OP: 我建议您尝试其中的一些算法,以便您做出判断。我也很好奇是否有人可以重现我得到的一些结果与@Padraic 的结果。关于这一点,我也很想知道这是否可能与 time.clock 和 timeit?
之间的差异有关编辑 2:
好的。我想我已经用尽了这里的所有优化途径。而且,它只使用一个列表而不是两个,这也更接近 OP 的目标(he/she 有兴趣看看是否这可以在不创建额外的空列表的情况下完成)