比较两个字符串列表
Comparing Two List of Strings
我熟悉比较 2 个整数和字符串列表;然而,当比较 2 个包含额外字符的字符串列表时,可能会有点挑战。
假设输出包含以下内容,我将其分解为字符串列表。
我在我的代码中称它为 diff。
输出
164c164
< Apples =
---
> Apples = 0
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
第二组字符串包含我希望与第一个列表进行比较的忽略变量。
>>> ignore
['Apples', 'Lemons']
我的代码:
>>> def str_compare (ignore, output):
... flag = 0
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... for i in ignore:
... if i in line:
... flag = 1
... if flag:
... flag = 0
... else:
... print (line)
...
>>>
该代码适用于省略了 Apple 和 Lemons 的情况。
>>> str_compare(ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
>>>
必须有更好的方法来比较 2 个不是 O(n^2) 的字符串。如果我的差异列表不包含像 "Apples =" 这样的额外字符,那么比较两个列表可以用 O(n) 来实现。有什么建议或想法可以在不循环遍历每个差异元素上的 "ignore" 变量的情况下进行比较吗?
更新 #1
为了避免混淆和使用建议的评论,我更新了代码。
>>> def str_compare (ignore, output):
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... if not any ([i in line for i in ignore]):
... print (line)
... print ("---")
>>>
无论如何,它仍然会为每个 diff 元素循环忽略两次。
为了提高效率,请忽略集而不是列表。使用拆分从行中获取关键字。
>>> def str_compare (ignore, output):
... ignore = set (ignore)
... diff = output.strip ().split ('\n')
... for line in diff:
... if line.startswith('<') or line.startswith('>'):
... var = line.split () [1]
... if var not in ignore:
... print (line)
... else:
... print (line)
...
输出
>>> str_compare (ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
您可以通过拆分和连接“---\n”来消除对标志的需求(比标志或打字 ---- 更通用的解决方案)
请注意,字符串包含 s1 在 s2 最坏情况下应约为 len(s1) * len(2),而相等约为 max(len(s1),len(s2)。而 python 实现是相当不错(对于一般情况),线性复杂度算法似乎存在 http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf
另见 Algorithm to find multiple string matches
我熟悉比较 2 个整数和字符串列表;然而,当比较 2 个包含额外字符的字符串列表时,可能会有点挑战。
假设输出包含以下内容,我将其分解为字符串列表。 我在我的代码中称它为 diff。
输出
164c164
< Apples =
---
> Apples = 0
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
第二组字符串包含我希望与第一个列表进行比较的忽略变量。
>>> ignore
['Apples', 'Lemons']
我的代码:
>>> def str_compare (ignore, output):
... flag = 0
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... for i in ignore:
... if i in line:
... flag = 1
... if flag:
... flag = 0
... else:
... print (line)
...
>>>
该代码适用于省略了 Apple 和 Lemons 的情况。
>>> str_compare(ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
>>>
必须有更好的方法来比较 2 个不是 O(n^2) 的字符串。如果我的差异列表不包含像 "Apples =" 这样的额外字符,那么比较两个列表可以用 O(n) 来实现。有什么建议或想法可以在不循环遍历每个差异元素上的 "ignore" 变量的情况下进行比较吗?
更新 #1 为了避免混淆和使用建议的评论,我更新了代码。
>>> def str_compare (ignore, output):
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... if not any ([i in line for i in ignore]):
... print (line)
... print ("---")
>>>
无论如何,它仍然会为每个 diff 元素循环忽略两次。
为了提高效率,请忽略集而不是列表。使用拆分从行中获取关键字。
>>> def str_compare (ignore, output):
... ignore = set (ignore)
... diff = output.strip ().split ('\n')
... for line in diff:
... if line.startswith('<') or line.startswith('>'):
... var = line.split () [1]
... if var not in ignore:
... print (line)
... else:
... print (line)
...
输出
>>> str_compare (ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
您可以通过拆分和连接“---\n”来消除对标志的需求(比标志或打字 ---- 更通用的解决方案)
请注意,字符串包含 s1 在 s2 最坏情况下应约为 len(s1) * len(2),而相等约为 max(len(s1),len(s2)。而 python 实现是相当不错(对于一般情况),线性复杂度算法似乎存在 http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf 另见 Algorithm to find multiple string matches