如何使用正则表达式对这个字符串进行排序

How do I use regex to sort this string

我正在尝试按字符串开头的日期对这串数据进行排序,但我不确定如何使用此正则表达式拆分、合并和排序它。是的,我正在使用 re.MULTILINE.

匹配行首日期的正则表达式:

^ [0-9]{4}

我需要排序的字符串示例:

 string = ''' 
 2013 this is data 3 (more data from 3)
 2016 this is data 6 (more data from 6)
 2011 this is data 1 (more data from 1)
 2012 this is data 2 (more data from 2)
 2014 this is data 4 (more data from 4)'''

我想要的样子:

 string = ''' 
 2016 this is data 6 (more data from 6)
 2014 this is data 4 (more data from 4)
 2013 this is data 3 (more data from 3)
 2012 this is data 2 (more data from 2)
 2011 this is data 1 (more data from 1)'''

使用类似这样的方法将它分成年份和其他所有内容,如果它始终是 4 位数的年份:

result = [(x[0:4], x[4:]) for x in data]

这只是将字符串分成元组列表,其中第一项是年份,第二项是其他所有内容。它避免了必须启动正则表达式解析器,这对于如此基本的东西来说太过分了。

然后排序:

result.sort(key = lambda x: x[0], reverse=True)

这会就地对列表进行排序,并告诉它使用元组的第一个元素作为排序键。它反转它以按降序排序,因为这就是您在示例中的顺序。

将其放回示例输出中的方式:

output = [' '.join(x) for x in result]

这会获取元组并将它们粘贴回单个字符串,使用 space 作为分隔符。

如果你真的只想接受这个输入并打印排序后的版本,@samusa 的答案就是你要走的路。如果您想对数据进行任何类型的额外操作,需要将其分解为单独的字符串,并将年份与其他字符串分开,请按照我描述的方式进行。

如果我没理解错的话,你的输入数据是一长串。您可以在字符串上使用 str.splitlines() 逐行获取它。

multiline_str = """ 2013 this is data 3 (blah blah blah)
 2016 this is data 6 (blah blah blah)
 2011 this is data 1 (blah blah blah)
 2012 this is data 2 (blah blah blah)
 2014 this is data 4 (blah blah blah)
"""
sorted(multiline_str.splitlines(), key=lambda x: x[:5], reverse=True)

会产生这个:

[' 2016 this is data 6 (blah blah blah)', 
 ' 2014 this is data 4 (blah blah blah)',
 ' 2013 this is data 3 (blah blah blah)', 
 ' 2012 this is data 2 (blah blah blah)', 
 ' 2011 this is data 1 (blah blah blah)', 
 ' ']

关于使用拆分而不是正则表达式的评论请求。

您可以按换行符拆分字符串并按第一个单词倒序排序:

strings = '2011 blah blah\n2016 blah blah'
parts = strings.split('\n')

result = sorted(parts, key=lambda a: a[:a.index(' ')], reverse=True)
print('\n'.join(result))
2016 blah blah
2011 blah blah

如果你真的需要在你的代码中使用正则表达式来对字符串进行排序,你必须经历以下过程:

  • 根据换行符拆分字符串。
  • 使用sorted()方法根据字符串中捕获的数字对列表进行排序。
  • 使用字符串.join()方法将列表转换为字符串。 例如
import re

longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)
"""

data = longtext.splitlines()
data = sorted(data , key = lambda v: re.search(r'^[\t ]*[\d]{4,}', v)[0], reverse=True)

new = '\n'.join(data)
print(new)

输出:

2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)

使用正则表达式查找年份,然后让年份成为字典键,然后对字典键进行降序排序。通过get输出值和排序后的key值的key。

 longtext = """2013 this is data 3 (more data from 3)
 2016 this is data 6 (more data from 6)
 2011 this is data 1 (more data from 1)
 2012 this is data 2 (more data from 2)
 2014 this is data 4 (more data from 4)"""

 data = longtext.splitlines()
 dct={}
 for x in data:
      dct[(re.search(r'\d{4}',x))[0]]=x

 for i in sorted (dct.keys(),reverse=True) :  
      print(dct.get(i))

输出:

2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)