如何使用正则表达式对这个字符串进行排序

Question

我正在尝试按字符串开头的日期对这串数据进行排序，但我不确定如何使用此正则表达式拆分、合并和排序它。是的，我正在使用 re.MULTILINE.

匹配行首日期的正则表达式：

^ [0-9]{4}

我需要排序的字符串示例：

 string = ''' 
 2013 this is data 3 (more data from 3)
 2016 this is data 6 (more data from 6)
 2011 this is data 1 (more data from 1)
 2012 this is data 2 (more data from 2)
 2014 this is data 4 (more data from 4)'''

我想要的样子：

 string = ''' 
 2016 this is data 6 (more data from 6)
 2014 this is data 4 (more data from 4)
 2013 this is data 3 (more data from 3)
 2012 this is data 2 (more data from 2)
 2011 this is data 1 (more data from 1)'''

Answer 1

使用类似这样的方法将它分成年份和其他所有内容，如果它始终是 4 位数的年份：

result = [(x[0:4], x[4:]) for x in data]

这只是将字符串分成元组列表，其中第一项是年份，第二项是其他所有内容。它避免了必须启动正则表达式解析器，这对于如此基本的东西来说太过分了。

然后排序：

result.sort(key = lambda x: x[0], reverse=True)

这会就地对列表进行排序，并告诉它使用元组的第一个元素作为排序键。它反转它以按降序排序，因为这就是您在示例中的顺序。

将其放回示例输出中的方式：

output = [' '.join(x) for x in result]

这会获取元组并将它们粘贴回单个字符串，使用 space 作为分隔符。

如果你真的只想接受这个输入并打印排序后的版本，@samusa 的答案就是你要走的路。如果您想对数据进行任何类型的额外操作，需要将其分解为单独的字符串，并将年份与其他字符串分开，请按照我描述的方式进行。

Answer 2

如果我没理解错的话，你的输入数据是一长串。您可以在字符串上使用 str.splitlines() 逐行获取它。

multiline_str = """ 2013 this is data 3 (blah blah blah)
 2016 this is data 6 (blah blah blah)
 2011 this is data 1 (blah blah blah)
 2012 this is data 2 (blah blah blah)
 2014 this is data 4 (blah blah blah)
"""
sorted(multiline_str.splitlines(), key=lambda x: x[:5], reverse=True)

会产生这个：

[' 2016 this is data 6 (blah blah blah)', 
 ' 2014 this is data 4 (blah blah blah)',
 ' 2013 this is data 3 (blah blah blah)', 
 ' 2012 this is data 2 (blah blah blah)', 
 ' 2011 this is data 1 (blah blah blah)', 
 ' ']

Answer 3

关于使用拆分而不是正则表达式的评论请求。

您可以按换行符拆分字符串并按第一个单词倒序排序：

strings = '2011 blah blah\n2016 blah blah'
parts = strings.split('\n')

result = sorted(parts, key=lambda a: a[:a.index(' ')], reverse=True)
print('\n'.join(result))

2016 blah blah
2011 blah blah

Answer 4

如果你真的需要在你的代码中使用正则表达式来对字符串进行排序，你必须经历以下过程：

根据换行符拆分字符串。
使用sorted()方法根据字符串中捕获的数字对列表进行排序。
使用字符串.join()方法将列表转换为字符串。例如

import re

longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)
"""

data = longtext.splitlines()
data = sorted(data , key = lambda v: re.search(r'^[\t ]*[\d]{4,}', v)[0], reverse=True)

new = '\n'.join(data)
print(new)

输出：

2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)

Answer 5

使用正则表达式查找年份，然后让年份成为字典键，然后对字典键进行降序排序。通过get输出值和排序后的key值的key。

 longtext = """2013 this is data 3 (more data from 3)
 2016 this is data 6 (more data from 6)
 2011 this is data 1 (more data from 1)
 2012 this is data 2 (more data from 2)
 2014 this is data 4 (more data from 4)"""

 data = longtext.splitlines()
 dct={}
 for x in data:
      dct[(re.search(r'\d{4}',x))[0]]=x

 for i in sorted (dct.keys(),reverse=True) :  
      print(dct.get(i))

输出：

2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)

如何使用正则表达式对这个字符串进行排序

How do I use regex to sort this string

python

regex

python-re