如何检查字符串是否相同但是否有重复的字符

How to check if strings are same but one has repeated chars

如果我有一些字符串(示例字符串:["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"])我如何检查它们是否相似并且只是重复字符然后 find 检查中的哪两个应该先进行(最小的在前)? (示例输出:["nice", "niiiice", "niiiiiiiceee", "yummy", "shine", "shiiinee", "hello", "print", "priinter", "priintering", "Howdy", "yup", "yuup", "soup", "soooouuuuuppppp", "yeehaw"]

注意:

如果可能,支票应按相同顺序保留其他所有内容。我的意思是,如果有更多没有相似对应物的字符串,它们会留在大致相同的位置。

您可以挤出重复的字符,使“相似”的字符串变得相等

import re

a = ["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"]

def squeeze(s):
    return re.sub(r'(.)+', r'', s)

a.sort(key=lambda s: (squeeze(s), len(s)))

print(a)

输出:

['nice', 'niiiice', 'niiiiiiiceee', 'shine', 'shiiinee']

或者,如果您只想对连续的“相似”字符串组进行排序:

from itertools import groupby
import re

a = ["niiiice", "niiiiiiiceee", "nice", "yummy", "shiiinee", "shine", "hello", "print", "priintering", "priinter", "Howdy", "yuup", "yup", "soooouuuuuppppp", "soup", "yeehaw"]

def squeeze(s):
    return re.sub(r'(.)+', r'', s)

a = [s for _, g in groupby(a, squeeze) for s in sorted(g, key=len)]

print(a)

输出:

['nice', 'niiiice', 'niiiiiiiceee', 'yummy', 'shine', 'shiiinee', 'hello', 'print', 'priintering', 'priinter', 'Howdy', 'yup', 'yuup', 'soup', 'soooouuuuuppppp', 'yeehaw']

另一个解决方案,使用itertools.groupby

import itertools
sorted(["niiiice", "niiiiiiiceee", "nice", "shiiinee", "shine"], key=lambda s: ([k for k, v in itertools.groupby(s)], len(s)))