为了使这种方法更快
To make this methods faster
我在python3中写了这个函数来合并2个xml文件。
合并在第一级进行,因此不需要递归调用自身。问题是它花费了很多时间,因为 xml 文件很大。请帮我优化这段代码。谢谢
这是函数:
def combine_element(one, other):
channel_ids = []
programs_startstop = []
for el in one:
if el.tag == 'channel':
channel_ids.append(el.get('id'))
elif el.tag == 'programme':
programs_startstop.append((el.get('start'), el.get('stop')))
i = 0
printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)
for el in other:
if el.tag == 'channel':
if not el.get('id') in channel_ids:
one.append(el)
channel_ids.append(el.get('id'))
elif el.tag == 'programme':
if not (el.get('start'), el.get('stop')) in programs_startstop:
one.append(el)
programs_startstop.append((el.get('start'), el.get('stop')))
i += 1
printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)
这是要合并的 xml 个文件的示例:
第一个文件:
<tv>
<channel id="C1">
<display-name lang="en">C1</display-name>
</channel>
<channel id="C2">
<display-name lang="en">C2</display-name>
</channel>
<programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
<title lang="en">P1</title>
<desc lang="en">Program 1</desc>
</programme>
<programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
<title lang="en">P2</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
第二个文件:
<tv>
<channel id="C3">
<display-name lang="en">C3</display-name>
</channel>
<channel id="C4">
<display-name lang="en">C4</display-name>
</channel>
<programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
<title lang="en">P3</title>
<desc lang="en">Program 3</desc>
</programme>
<programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
<title lang="en">P4</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
代码应该忽略第二个文件中的元素,如果它具有相同的 id,如果它在第一个文件中具有相同的开始和停止时间,则忽略第二个文件中的程序。 xml 这里给出的代码是一个例子,因为我不能分享实际数据。
这是该方法的预期结果,但速度更快:
<tv>
<channel id="C1">
<display-name lang="en">C1</display-name>
</channel>
<channel id="C2">
<display-name lang="en">C2</display-name>
</channel>
<programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
<title lang="en">P1</title>
<desc lang="en">Program 1</desc>
</programme>
<programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
<title lang="en">P2</title>
<desc lang="en">Program 2</desc>
</programme>
<channel id="C3">
<display-name lang="en">C3</display-name>
</channel>
<channel id="C4">
<display-name lang="en">C4</display-name>
</channel>
<programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
<title lang="en">P3</title>
<desc lang="en">Program 3</desc>
</programme>
<programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
<title lang="en">P4</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
您应该将检索元素的位置提取到生成 key-value 元组对的生成器函数。
根据对两个参数调用生成器函数的结果创建字典并合并字典。
def elements(lst):
for el in lst:
if el.tag == 'channel':
yield el.get('id'), el
if el.tag == 'programme':
yield (el.get('start'), el.get('stop')), el
def combine_element(one, other):
one_els = elements(one)
other_els = elements(other)
merged_els = dict(other_els)
merged_els.update(one_els)
result_els = []
progressend = len(merged_els)
for i, (_k, el) in enumerate(merged_els.items()):
printProgressBar(
i, progressend, prefix='Progress:', suffix='Complete', length=50)
result_els.append(el)
return result_els
我在python3中写了这个函数来合并2个xml文件。
合并在第一级进行,因此不需要递归调用自身。问题是它花费了很多时间,因为 xml 文件很大。请帮我优化这段代码。谢谢
这是函数:
def combine_element(one, other):
channel_ids = []
programs_startstop = []
for el in one:
if el.tag == 'channel':
channel_ids.append(el.get('id'))
elif el.tag == 'programme':
programs_startstop.append((el.get('start'), el.get('stop')))
i = 0
printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)
for el in other:
if el.tag == 'channel':
if not el.get('id') in channel_ids:
one.append(el)
channel_ids.append(el.get('id'))
elif el.tag == 'programme':
if not (el.get('start'), el.get('stop')) in programs_startstop:
one.append(el)
programs_startstop.append((el.get('start'), el.get('stop')))
i += 1
printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)
这是要合并的 xml 个文件的示例:
第一个文件:
<tv>
<channel id="C1">
<display-name lang="en">C1</display-name>
</channel>
<channel id="C2">
<display-name lang="en">C2</display-name>
</channel>
<programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
<title lang="en">P1</title>
<desc lang="en">Program 1</desc>
</programme>
<programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
<title lang="en">P2</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
第二个文件:
<tv>
<channel id="C3">
<display-name lang="en">C3</display-name>
</channel>
<channel id="C4">
<display-name lang="en">C4</display-name>
</channel>
<programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
<title lang="en">P3</title>
<desc lang="en">Program 3</desc>
</programme>
<programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
<title lang="en">P4</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
代码应该忽略第二个文件中的元素,如果它具有相同的 id,如果它在第一个文件中具有相同的开始和停止时间,则忽略第二个文件中的程序。 xml 这里给出的代码是一个例子,因为我不能分享实际数据。
这是该方法的预期结果,但速度更快:
<tv>
<channel id="C1">
<display-name lang="en">C1</display-name>
</channel>
<channel id="C2">
<display-name lang="en">C2</display-name>
</channel>
<programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
<title lang="en">P1</title>
<desc lang="en">Program 1</desc>
</programme>
<programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
<title lang="en">P2</title>
<desc lang="en">Program 2</desc>
</programme>
<channel id="C3">
<display-name lang="en">C3</display-name>
</channel>
<channel id="C4">
<display-name lang="en">C4</display-name>
</channel>
<programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
<title lang="en">P3</title>
<desc lang="en">Program 3</desc>
</programme>
<programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
<title lang="en">P4</title>
<desc lang="en">Program 2</desc>
</programme>
</tv>
您应该将检索元素的位置提取到生成 key-value 元组对的生成器函数。
根据对两个参数调用生成器函数的结果创建字典并合并字典。
def elements(lst):
for el in lst:
if el.tag == 'channel':
yield el.get('id'), el
if el.tag == 'programme':
yield (el.get('start'), el.get('stop')), el
def combine_element(one, other):
one_els = elements(one)
other_els = elements(other)
merged_els = dict(other_els)
merged_els.update(one_els)
result_els = []
progressend = len(merged_els)
for i, (_k, el) in enumerate(merged_els.items()):
printProgressBar(
i, progressend, prefix='Progress:', suffix='Complete', length=50)
result_els.append(el)
return result_els