如何检查字符串是否相同但是否有重复的字符

Question

如果我有一些字符串（示例字符串：["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]）我如何检查它们是否相似并且只是重复字符然后 find 检查中的哪两个应该先进行（最小的在前）？（示例输出：["nice", "niiiice", "niiiiiiiceee", "yummy", "shine", "shiiinee", "hello", "print", "priinter", "priintering", "Howdy", "yup", "yuup", "soup", "soooouuuuuppppp", "yeehaw"]

注意：

如果可能，支票应按相同顺序保留其他所有内容。我的意思是，如果有更多没有相似对应物的字符串，它们会留在大致相同的位置。

Answer 1

您可以挤出重复的字符，使“相似”的字符串变得相等。

import re

a = ["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"]

def squeeze(s):
    return re.sub(r'(.)+', r'', s)

a.sort(key=lambda s: (squeeze(s), len(s)))

print(a)

输出：

['nice', 'niiiice', 'niiiiiiiceee', 'shine', 'shiiinee']

或者，如果您只想对连续的“相似”字符串组进行排序：

from itertools import groupby
import re

a = ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]

def squeeze(s):
    return re.sub(r'(.)+', r'', s)

a = [s for _, g in groupby(a, squeeze) for s in sorted(g, key=len)]

print(a)

输出：

['nice', 'niiiice', 'niiiiiiiceee', 'yummy', 'shine', 'shiiinee', 'hello', 'print', 'priintering', 'priinter', 'Howdy', 'yup', 'yuup', 'soup', 'soooouuuuuppppp', 'yeehaw']

Answer 2

另一个解决方案，使用itertools.groupby：

import itertools
sorted(["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"], key=lambda s: ([k for k, v in itertools.groupby(s)], len(s)))

如何检查字符串是否相同但是否有重复的字符

How to check if strings are same but one has repeated chars

python

sorting