改进嵌套循环以提高效率
Improving a nested loop for efficiency
我正在从事一个关于 PSL 文件分析的项目。该程序总体上着眼于 readpairs 并识别环状分子。我的程序可以正常运行,但由于我的操作是嵌套的,因此读取整个 PSL 文件需要超过 10 分钟的时间,而不是应该的 ~15 秒,因此效率非常低下。
相关代码为:
def readPSLpairs(self):
posread = []
negread = []
result = {}
for psl in self.readPSL():
parsed = psl.split()
strand = parsed[9][-1]
if strand == '1':
posread.append(parsed)
elif strand == '2':
negread.append(parsed)
for read in posread:
posname = read[9][:-2]
poscontig = read[13]
for read in negread:
negname = read[9][:-2]
negcontig = read[13]
if posname == negname and poscontig == negcontig:
try:
result[poscontig] += 1
break
except:
result[poscontig] = 1
break
print(result)
我曾尝试更改整体操作,而不是将值附加到列表,然后尝试匹配 posname = negname 和 poscontig = negcontig,但事实证明它比我想象的要难得多,所以我被卡住了试图改进这一切的功能。
import collections
all_dict = {"pos": collections.defaultdict(int),
"neg": collections.defaultdict(int)}
result = {}
for psl in self.readPSL():
parsed = pls.split()
strand = "pos" if parsed[9][-1]=='1' else "neg"
name, contig = parsed[9][:-2], parsed[13]
all_dict[strand][(name,contig)] += 1
# pre-process all the psl's into all_dict['pos'] or all_dict['neg']
# this is basically just a `collections.Counter` of what you're doing already!
for info, posqty in all_dict['pos'].items():
negqty = all_dict['neg'][info] # (defaults to zero)
result[info] = qty * other_qty
# process all the 'pos' psl's. For every match with a 'neg', set
# result[(name, contig)] to the total (posqty * negqty)
请注意,这将丢弃整个已解析的 psl 值,仅保留 name
和 contig
切片。
我正在从事一个关于 PSL 文件分析的项目。该程序总体上着眼于 readpairs 并识别环状分子。我的程序可以正常运行,但由于我的操作是嵌套的,因此读取整个 PSL 文件需要超过 10 分钟的时间,而不是应该的 ~15 秒,因此效率非常低下。
相关代码为:
def readPSLpairs(self):
posread = []
negread = []
result = {}
for psl in self.readPSL():
parsed = psl.split()
strand = parsed[9][-1]
if strand == '1':
posread.append(parsed)
elif strand == '2':
negread.append(parsed)
for read in posread:
posname = read[9][:-2]
poscontig = read[13]
for read in negread:
negname = read[9][:-2]
negcontig = read[13]
if posname == negname and poscontig == negcontig:
try:
result[poscontig] += 1
break
except:
result[poscontig] = 1
break
print(result)
我曾尝试更改整体操作,而不是将值附加到列表,然后尝试匹配 posname = negname 和 poscontig = negcontig,但事实证明它比我想象的要难得多,所以我被卡住了试图改进这一切的功能。
import collections
all_dict = {"pos": collections.defaultdict(int),
"neg": collections.defaultdict(int)}
result = {}
for psl in self.readPSL():
parsed = pls.split()
strand = "pos" if parsed[9][-1]=='1' else "neg"
name, contig = parsed[9][:-2], parsed[13]
all_dict[strand][(name,contig)] += 1
# pre-process all the psl's into all_dict['pos'] or all_dict['neg']
# this is basically just a `collections.Counter` of what you're doing already!
for info, posqty in all_dict['pos'].items():
negqty = all_dict['neg'][info] # (defaults to zero)
result[info] = qty * other_qty
# process all the 'pos' psl's. For every match with a 'neg', set
# result[(name, contig)] to the total (posqty * negqty)
请注意,这将丢弃整个已解析的 psl 值,仅保留 name
和 contig
切片。