按多个分隔的数字对字符串进行排序

Question

我有一个路径列表，我在此处将其简化为类似但更简单的字符串：

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

这些路径需要按照数字的顺序进行排序。第一个数字 (apple) 在搜索中最重要，其次是第二个。

一个可能很明显的复杂情况是，一些路径将有数据所在的第三目录，而其他路径则没有。

路径结构的 MWE 如下所示：

parent 
|-----apple1 
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple2
          |------banana1 
                   |----- data*
          |------banana2 
                   |----- data*
|-----apple10
          |------banana1 
                   |-----carrot1
                            |-----data*
                   |-----carrot2
                            |-----data*
          |------banana2 
                   |----- carrot1
                             |-----data*

期望的输出是：

paths = ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2','apple10/banana2/carrot1']

我正在努力弄清楚如何做到这一点。排序将不起作用，特别是因为数字将变成两位数，并且 10 会排在 2 之前。

我看到了另一个答案，它适用于字符串列表中的单个数字。 How to correctly sort a string with a number inside? 我未能使它适应我的问题。

如有任何帮助，我们将不胜感激。

Answer 1

尝试使用 sorted，提供使用 re 的自定义键从路径中提取所有数字：

import re

>>> sorted(paths, key=lambda x: list(map(int,re.findall("(\d+)", x))))
['apple1/banana1',
 'apple1/banana2',
 'apple2/banana1',
 'apple2/banana2',
 'apple10/banana1/carrot1',
 'apple10/banana1/carrot2',
 'apple10/banana2/carrot1']

Answer 2

如果您可以将数据表示为元组而不是字符串，那么事情会变得更容易：

paths = [('apple', 10, 'banana', 2, 'carrot', 1),
         ('apple', 10, 'banana', 1, 'carrot', 2),
         ('apple', 2, 'banana', 1),
         ('apple', 2, 'banana', 2),
         ('apple', 1, 'banana', 1),
         ('apple', 1, 'banana', 2),
         ('apple', 10, 'banana', 1, 'carrot', 1)
         ]

paths.sort(key=lambda item: (len(item), item))
print(paths)

我认为输出如你所愿：

[('apple', 1, 'banana', 1), ('apple', 1, 'banana', 2), ('apple', 2, 'banana', 1), ('apple', 2, 'banana', 2), ('apple', 10, 'banana', 1, 'carrot', 1), ('apple', 10, 'banana', 1, 'carrot', 2), ('apple', 10, 'banana', 2, 'carrot', 1)]

Answer 3

@not_speshal的回答补充：

根据问题的回答，你已经提供了，如果你在路径中的第一个词不一定是“苹果”，你可以这样做：

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def word_and_num_as_tuple(text):
    return tuple( atoi(c) for c in re.split(r'(\d+)', text) )

def path_as_sortable_tuple(path, sep='/'):
    return tuple( word_and_num_as_tuple(word_in_path) for word_in_path in path.split(sep) )

paths = [
    'apple10/banana2/carrot1',
    'apple10/banana1/carrot2',
    'apple2/banana1',
    'apple2/banana2',
    'apple1/banana1',
    'apple1/banana2',
    'apple10/banana1/carrot1'
]


paths.sort(key=path_as_sortable_tuple)
print(paths)

# And, of course, as a lambda one-liner:
paths.sort( key= lambda path: tuple( tuple( int(char_seq) if char_seq.isdigit() else char_seq for char_seq in re.split(r'(\d+)', subpath) ) for subpath in path.split('/') ) )

它完全按照@MarcinCuprjak 的建议执行，但自动执行

Answer 4

使用以下工具：

itertools.groupby with str.isdigit 将字符分组为连续的数字组或 non-digits;
''.join 由字符组组成单词；
一个列表理解来迭代组并过滤掉 non-digits;
int 将来自一组数字的单词转换为整数。

将这些工具组合到 tuple 键中 sorted:

from itertools import groupby

paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']

sorted(paths,
       key=lambda s: tuple(int(''.join(group))
                           for are_digits,group in groupby(s, key=str.isdigit)
                           if are_digits))
# ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2', 'apple10/banana2/carrot1']

按多个分隔的数字对字符串进行排序

Sort string by multiple separated numbers

python

sorting

string

list