解包 SequenceMatcher 循环结果
Unpacking SequenceMatcher loop results
将 SequenceMatcher
循环结果解压缩为 Python 以便可以轻松访问和处理值的最佳方法是什么?
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
match_id = []
for block in diff.get_matching_blocks():
match_id.append(block)
print(match_id)
字符串整数表示汉字。
当前迭代代码将匹配结果存储在如下列表中:
match_id
[Match(a=0, b=0, size=4), Match(a=4, b=7, size=2), Match(a=6, b=16, size=4), Match(a=10, b=27, size=0)]
我最终想用 "{{"
和 "}}"
标记评论,如下所示:
"1234{{354}}56{{3534535}}7890{{1343154}}"
这意味着,我有兴趣解包上述 SequenceMatcher
结果并对特定的 b
和 size
值进行一些计算以产生此序列:
rslt = [[0+4,7],[7+2,16],[16+4,27]]
这是 [b[i]+size[i],b[i+1]]
的重复。
1。解包 SequenceMatcher
结果产生一个序列
您可以解压缩 match_id
,然后在您的表达式中使用列表理解。
a, b, size = zip(*match_id)
# a = (0, 4, 6, 10)
# b = (0, 7, 16, 27)
# size = (4, 2, 4, 0)
rslt = [[b[i] + size[i], b[i+1]] for i in range(len(match_id)-1)]
# rslt = [[4, 7], [9, 16], [20, 27]]
参考 zip
,一个 Python 内置函数:https://docs.python.org/3/library/functions.html#zip
2。用 "{{"
和 "}}"
标记注释
您可以遍历 rslt
,然后很好地附加到目前为止的匹配项并标记注释。
rslt_str = ""
prev_end = 0
for start, end in rslt:
rslt_str += commented[prev_end:start]
if start != end:
rslt_str += "{{%s}}" % commented[start:end]
prev_end = end
# rslt_str = "1234{{354}}56{{3534535}}7890{{1343154}}"
我会这样做:
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
match_id = []
rslt_str = ""
for block in diff.get_matching_blocks():
match_id.append(block)
temp = 0
for i, m in enumerate(match_id[:-1]):
rslt_str += commented[temp:m.b + m.size] + "{{"
rslt_str += commented[m.b + m.size: match_id[i+1].b] + "}}"
temp = match_id[i+1].b
所以rslt_str == "1234{{354}}56{{3534535}}7890{{1343154}}"
你可以试试这个:
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
a, b, size = zip(*diff.get_matching_blocks())
start = {x + y : '{{' for x, y in zip(b[:-1],size)}
end = dict.fromkeys(b[1:], '}}')
rslt = {**start, **end}
final_str = ''.join(rslt.get(ix,'') + n for ix, n in enumerate(commented)) + '}}'
print(final_str)
输出:
'1234{{354}}56{{3534535}}7890{{1343154}}'
解释:
因为 SequenceMatcher().matching_blocks()
是可迭代的,所以你可以直接将它解压到你的变量中。
- 然后创建一个字典,以起始索引为键,
{{
为值。
- 类似地,创建一个以结束索引为键,以
}}
为值的字典。
- 解压
rslt
中的两个词典。
然后通过将 commented
的字符作为默认值传递给 dict.get
和对于 rslt
字典中的索引,在字符前面加上相应的花括号来形成一个列表。最后加入字符串。
将 SequenceMatcher
循环结果解压缩为 Python 以便可以轻松访问和处理值的最佳方法是什么?
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
match_id = []
for block in diff.get_matching_blocks():
match_id.append(block)
print(match_id)
字符串整数表示汉字。
当前迭代代码将匹配结果存储在如下列表中:
match_id
[Match(a=0, b=0, size=4), Match(a=4, b=7, size=2), Match(a=6, b=16, size=4), Match(a=10, b=27, size=0)]
我最终想用 "{{"
和 "}}"
标记评论,如下所示:
"1234{{354}}56{{3534535}}7890{{1343154}}"
这意味着,我有兴趣解包上述 SequenceMatcher
结果并对特定的 b
和 size
值进行一些计算以产生此序列:
rslt = [[0+4,7],[7+2,16],[16+4,27]]
这是 [b[i]+size[i],b[i+1]]
的重复。
1。解包 SequenceMatcher
结果产生一个序列
您可以解压缩 match_id
,然后在您的表达式中使用列表理解。
a, b, size = zip(*match_id)
# a = (0, 4, 6, 10)
# b = (0, 7, 16, 27)
# size = (4, 2, 4, 0)
rslt = [[b[i] + size[i], b[i+1]] for i in range(len(match_id)-1)]
# rslt = [[4, 7], [9, 16], [20, 27]]
参考 zip
,一个 Python 内置函数:https://docs.python.org/3/library/functions.html#zip
2。用 "{{"
和 "}}"
标记注释
您可以遍历 rslt
,然后很好地附加到目前为止的匹配项并标记注释。
rslt_str = ""
prev_end = 0
for start, end in rslt:
rslt_str += commented[prev_end:start]
if start != end:
rslt_str += "{{%s}}" % commented[start:end]
prev_end = end
# rslt_str = "1234{{354}}56{{3534535}}7890{{1343154}}"
我会这样做:
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
match_id = []
rslt_str = ""
for block in diff.get_matching_blocks():
match_id.append(block)
temp = 0
for i, m in enumerate(match_id[:-1]):
rslt_str += commented[temp:m.b + m.size] + "{{"
rslt_str += commented[m.b + m.size: match_id[i+1].b] + "}}"
temp = match_id[i+1].b
所以rslt_str == "1234{{354}}56{{3534535}}7890{{1343154}}"
你可以试试这个:
from difflib import *
orig = "1234567890"
commented = "123435456353453578901343154"
diff = SequenceMatcher(None, orig, commented)
a, b, size = zip(*diff.get_matching_blocks())
start = {x + y : '{{' for x, y in zip(b[:-1],size)}
end = dict.fromkeys(b[1:], '}}')
rslt = {**start, **end}
final_str = ''.join(rslt.get(ix,'') + n for ix, n in enumerate(commented)) + '}}'
print(final_str)
输出:
'1234{{354}}56{{3534535}}7890{{1343154}}'
解释:
因为 SequenceMatcher().matching_blocks()
是可迭代的,所以你可以直接将它解压到你的变量中。
- 然后创建一个字典,以起始索引为键,
{{
为值。 - 类似地,创建一个以结束索引为键,以
}}
为值的字典。 - 解压
rslt
中的两个词典。
然后通过将 commented
的字符作为默认值传递给 dict.get
和对于 rslt
字典中的索引,在字符前面加上相应的花括号来形成一个列表。最后加入字符串。