在列表中查找非零条纹时改进运行时间
Improving runtime when finding nonzero streaks in a list
我编写了一个接受整数列表和值 l 的函数。 return 值是一个元组——第一个元素是包含非零值的整数条纹(长度必须为 l 或更长)的计数,第二个元素是这些条纹的平均长度。我将此函数用于一个生物信息学项目,该项目需要我输入包含数百万个整数的列表。我发现我写的方法太慢了。我怎样才能提高我的程序的效率?
def contigs_values(sequenced_lst, l):
"""
By splitting at zeros, can count contiguous sequenced sequences
"""
# a list of lists storing index start and end values of each contig
contig_indices = []
start = 0
end = 1
while end != len(sequenced_lst):
if 0 not in sequenced_lst[start:end + 1]: # only extend window if contiguous
end += 1 # extend contig window
# continue
if end == len(sequenced_lst):
contig_indices.append([start, end]) # append final contig indices as index list
else: # zero is found and contig broken
if end - start > 1:
if end - start < l: # debug test... TODO: no window should be less than R length, L
print("MISTAKE")
contig_indices.append([start, end]) # append contig indices as index list
start = end # start a new contig window
end += 1
num_contigs = len(contig_indices)
avg_contig_len = get_average([i[1] - i[0] for i in contig_indices])
return num_contigs, avg_contig_len
如果我对目标的理解正确,那么您的代码运行如此缓慢的原因是您不断检查列表的长度,因此在 O(n^2)
附近的某处
所以不计算列表的长度,而是求当前0和前一个0的差值
使用枚举相对容易:
def contigs_values(sequenced_lst, l):
"""
By splitting at zeros, can count contiguous sequenced sequences
"""
streaks = []
prev = 0
for index, val in enumerate(sequenced_lst):
if val == 0:
length = index - prev - 1
if length >= l:
streaks.append(length)
prev = index
else:
if index - prev >= l:
streaks.append(index - prev)
num_contigs = len(streaks)
avg_contig_len = sum(streaks)/num_contigs
return num_contigs, avg_contig_len
我编写了一个接受整数列表和值 l 的函数。 return 值是一个元组——第一个元素是包含非零值的整数条纹(长度必须为 l 或更长)的计数,第二个元素是这些条纹的平均长度。我将此函数用于一个生物信息学项目,该项目需要我输入包含数百万个整数的列表。我发现我写的方法太慢了。我怎样才能提高我的程序的效率?
def contigs_values(sequenced_lst, l):
"""
By splitting at zeros, can count contiguous sequenced sequences
"""
# a list of lists storing index start and end values of each contig
contig_indices = []
start = 0
end = 1
while end != len(sequenced_lst):
if 0 not in sequenced_lst[start:end + 1]: # only extend window if contiguous
end += 1 # extend contig window
# continue
if end == len(sequenced_lst):
contig_indices.append([start, end]) # append final contig indices as index list
else: # zero is found and contig broken
if end - start > 1:
if end - start < l: # debug test... TODO: no window should be less than R length, L
print("MISTAKE")
contig_indices.append([start, end]) # append contig indices as index list
start = end # start a new contig window
end += 1
num_contigs = len(contig_indices)
avg_contig_len = get_average([i[1] - i[0] for i in contig_indices])
return num_contigs, avg_contig_len
如果我对目标的理解正确,那么您的代码运行如此缓慢的原因是您不断检查列表的长度,因此在 O(n^2)
附近的某处
所以不计算列表的长度,而是求当前0和前一个0的差值
使用枚举相对容易:
def contigs_values(sequenced_lst, l):
"""
By splitting at zeros, can count contiguous sequenced sequences
"""
streaks = []
prev = 0
for index, val in enumerate(sequenced_lst):
if val == 0:
length = index - prev - 1
if length >= l:
streaks.append(length)
prev = index
else:
if index - prev >= l:
streaks.append(index - prev)
num_contigs = len(streaks)
avg_contig_len = sum(streaks)/num_contigs
return num_contigs, avg_contig_len