如何使用正则表达式对这个字符串进行排序
How do I use regex to sort this string
我正在尝试按字符串开头的日期对这串数据进行排序,但我不确定如何使用此正则表达式拆分、合并和排序它。是的,我正在使用 re.MULTILINE
.
匹配行首日期的正则表达式:
^ [0-9]{4}
我需要排序的字符串示例:
string = '''
2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)'''
我想要的样子:
string = '''
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)'''
使用类似这样的方法将它分成年份和其他所有内容,如果它始终是 4 位数的年份:
result = [(x[0:4], x[4:]) for x in data]
这只是将字符串分成元组列表,其中第一项是年份,第二项是其他所有内容。它避免了必须启动正则表达式解析器,这对于如此基本的东西来说太过分了。
然后排序:
result.sort(key = lambda x: x[0], reverse=True)
这会就地对列表进行排序,并告诉它使用元组的第一个元素作为排序键。它反转它以按降序排序,因为这就是您在示例中的顺序。
将其放回示例输出中的方式:
output = [' '.join(x) for x in result]
这会获取元组并将它们粘贴回单个字符串,使用 space 作为分隔符。
如果你真的只想接受这个输入并打印排序后的版本,@samusa 的答案就是你要走的路。如果您想对数据进行任何类型的额外操作,需要将其分解为单独的字符串,并将年份与其他字符串分开,请按照我描述的方式进行。
如果我没理解错的话,你的输入数据是一长串。您可以在字符串上使用 str.splitlines()
逐行获取它。
multiline_str = """ 2013 this is data 3 (blah blah blah)
2016 this is data 6 (blah blah blah)
2011 this is data 1 (blah blah blah)
2012 this is data 2 (blah blah blah)
2014 this is data 4 (blah blah blah)
"""
sorted(multiline_str.splitlines(), key=lambda x: x[:5], reverse=True)
会产生这个:
[' 2016 this is data 6 (blah blah blah)',
' 2014 this is data 4 (blah blah blah)',
' 2013 this is data 3 (blah blah blah)',
' 2012 this is data 2 (blah blah blah)',
' 2011 this is data 1 (blah blah blah)',
' ']
关于使用拆分而不是正则表达式的评论请求。
您可以按换行符拆分字符串并按第一个单词倒序排序:
strings = '2011 blah blah\n2016 blah blah'
parts = strings.split('\n')
result = sorted(parts, key=lambda a: a[:a.index(' ')], reverse=True)
print('\n'.join(result))
2016 blah blah
2011 blah blah
如果你真的需要在你的代码中使用正则表达式来对字符串进行排序,你必须经历以下过程:
- 根据换行符拆分字符串。
- 使用
sorted()
方法根据字符串中捕获的数字对列表进行排序。
- 使用字符串
.join()
方法将列表转换为字符串。
例如
import re
longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)
"""
data = longtext.splitlines()
data = sorted(data , key = lambda v: re.search(r'^[\t ]*[\d]{4,}', v)[0], reverse=True)
new = '\n'.join(data)
print(new)
输出:
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)
使用正则表达式查找年份,然后让年份成为字典键,然后对字典键进行降序排序。通过get输出值和排序后的key值的key。
longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)"""
data = longtext.splitlines()
dct={}
for x in data:
dct[(re.search(r'\d{4}',x))[0]]=x
for i in sorted (dct.keys(),reverse=True) :
print(dct.get(i))
输出:
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)
我正在尝试按字符串开头的日期对这串数据进行排序,但我不确定如何使用此正则表达式拆分、合并和排序它。是的,我正在使用 re.MULTILINE
.
匹配行首日期的正则表达式:
^ [0-9]{4}
我需要排序的字符串示例:
string = '''
2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)'''
我想要的样子:
string = '''
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)'''
使用类似这样的方法将它分成年份和其他所有内容,如果它始终是 4 位数的年份:
result = [(x[0:4], x[4:]) for x in data]
这只是将字符串分成元组列表,其中第一项是年份,第二项是其他所有内容。它避免了必须启动正则表达式解析器,这对于如此基本的东西来说太过分了。
然后排序:
result.sort(key = lambda x: x[0], reverse=True)
这会就地对列表进行排序,并告诉它使用元组的第一个元素作为排序键。它反转它以按降序排序,因为这就是您在示例中的顺序。
将其放回示例输出中的方式:
output = [' '.join(x) for x in result]
这会获取元组并将它们粘贴回单个字符串,使用 space 作为分隔符。
如果你真的只想接受这个输入并打印排序后的版本,@samusa 的答案就是你要走的路。如果您想对数据进行任何类型的额外操作,需要将其分解为单独的字符串,并将年份与其他字符串分开,请按照我描述的方式进行。
如果我没理解错的话,你的输入数据是一长串。您可以在字符串上使用 str.splitlines()
逐行获取它。
multiline_str = """ 2013 this is data 3 (blah blah blah)
2016 this is data 6 (blah blah blah)
2011 this is data 1 (blah blah blah)
2012 this is data 2 (blah blah blah)
2014 this is data 4 (blah blah blah)
"""
sorted(multiline_str.splitlines(), key=lambda x: x[:5], reverse=True)
会产生这个:
[' 2016 this is data 6 (blah blah blah)',
' 2014 this is data 4 (blah blah blah)',
' 2013 this is data 3 (blah blah blah)',
' 2012 this is data 2 (blah blah blah)',
' 2011 this is data 1 (blah blah blah)',
' ']
关于使用拆分而不是正则表达式的评论请求。
您可以按换行符拆分字符串并按第一个单词倒序排序:
strings = '2011 blah blah\n2016 blah blah'
parts = strings.split('\n')
result = sorted(parts, key=lambda a: a[:a.index(' ')], reverse=True)
print('\n'.join(result))
2016 blah blah
2011 blah blah
如果你真的需要在你的代码中使用正则表达式来对字符串进行排序,你必须经历以下过程:
- 根据换行符拆分字符串。
- 使用
sorted()
方法根据字符串中捕获的数字对列表进行排序。 - 使用字符串
.join()
方法将列表转换为字符串。 例如
import re
longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)
"""
data = longtext.splitlines()
data = sorted(data , key = lambda v: re.search(r'^[\t ]*[\d]{4,}', v)[0], reverse=True)
new = '\n'.join(data)
print(new)
输出:
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)
使用正则表达式查找年份,然后让年份成为字典键,然后对字典键进行降序排序。通过get输出值和排序后的key值的key。
longtext = """2013 this is data 3 (more data from 3)
2016 this is data 6 (more data from 6)
2011 this is data 1 (more data from 1)
2012 this is data 2 (more data from 2)
2014 this is data 4 (more data from 4)"""
data = longtext.splitlines()
dct={}
for x in data:
dct[(re.search(r'\d{4}',x))[0]]=x
for i in sorted (dct.keys(),reverse=True) :
print(dct.get(i))
输出:
2016 this is data 6 (more data from 6)
2014 this is data 4 (more data from 4)
2013 this is data 3 (more data from 3)
2012 this is data 2 (more data from 2)
2011 this is data 1 (more data from 1)