python 对负 and/or 十进制字母数字字符串进行排序
python sorting negative and/or decimal alphanumeric strings
我在对包含负 and/or 十进制字母数字字符串的字符串列表进行排序时遇到问题。这是我目前所拥有的:
import re
format_ids = ["synopsys_SS_2v_-40c_SS.lib",
"synopsys_SS_1v_-40c_SS.lib",
"synopsys_SS_1.2v_-40c_SS.lib",
"synopsys_SS_1.4v_-40c_SS.lib",
"synopsys_SS_2v_-40c_TT.lib",
"synopsys_FF_3v_25c_FF.lib",
"synopsys_TT_4v_125c_TT.lib",
"synopsys_TT_1v_85c_TT.lib",
"synopsys_TT_10v_85c_TT.lib",
"synopsys_FF_3v_-40c_SS.lib",
"synopsys_FF_3v_-40c_TT.lib"]
selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
#key = [2,1,3]
key = 2
produce_groups = False
if isinstance(key, int):
key = [key]
convert = lambda text: float(text) if text.isdigit() else text
alphanum_key = lambda k: [convert(c) for c in re.split('([-.\d]+)', k)]
split_list = lambda name: tuple(alphanum_key(re.findall(selector,name)[0][i]) for i in key)
format_ids.sort(key=split_list)
print "\n".join(format_ids)
我期待以下输出(按第 3 个键排序):
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
但我得到以下信息(所有负数都列在最后):
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
现在,对于第二个键的小数(将键变量更改为 1 (key=1)),我得到:
synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
期待:
synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
非常感谢任何建议。
编辑:我最终使用了@StephenRauch 描述的:
import re
def sort_names(format_ids, selector, key=1):
if isinstance(key, int):
key = [key]
SELECTOR_RE = re.compile(selector)
def convert(x):
try:
return float(x[:-1])
except ValueError:
return x
def sort_keys(key):
def split_fid(x):
x = SELECTOR_RE.split(x)
return tuple([convert(x[i]) for i in key])
return split_fid
format_ids.sort(key=sort_keys(key))
format_ids = ["synopsys_SS_2v_-40c_SS.lib",
"synopsys_SS_1v_-40c_SS.lib",
"synopsys_SS_1.2v_-40c_SS.lib",
"synopsys_SS_1.4v_-40c_SS.lib",
"synopsys_SS_2v_-40c_TT.lib",
"synopsys_FF_3v_25c_FF.lib",
"synopsys_TT_4v_125c_TT.lib",
"synopsys_TT_1v_85c_TT.lib",
"synopsys_TT_10v_85c_TT.lib",
"synopsys_FF_3v_-40c_SS.lib",
"synopsys_FF_3v_-40c_TT.lib"]
selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]
sort_names(format_ids,selector,key)
你的问题的很大一部分是只有实际数字被认为是数字,而不是破折号和句点,所以在你的代码中,像 "-40".isdigit() 或 "1.4".isdigit() 这样的东西将是错误的, 并保留为文本而不是转换为浮点数。
需要以不同的方式测试数字,并且 re.split() 被赋予了前导 ''
,这会导致转换例程中断。
固定代码:
key = [2,1,3]
def convert(x):
try:
return float(x)
except ValueError:
return x
alphanum_keys = lambda k: (convert(c) for c in re.split('([-.\d]+)', k))
alphanum_key = lambda k: [i for i in alphanum_keys(k) if i != ''][0]
split_list = lambda name: [
alphanum_key(re.findall(selector, name)[0][i]) for i in key]
format_ids.sort(key=split_list)
备用(更简单)解决方案:
但是...所有这些 lambda 和正则表达式都比您解决此问题所需的复杂得多。怎么样:
def sort_key(keys):
def convert(x):
try:
return float(x[:-1])
except ValueError:
return x
def f(x):
x = x.split('_')
return tuple([convert(x[i]) for i in keys])
return f
format_ids.sort(key=sort_key([3, 2, 4]))
如何?
sort_keys()
returns 函数 f()
。这是传递给 sort()
以评估排序顺序的一个参数的函数。函数 f()
将使用传递给 sort_keys()
的键的值,因为这些是值 available at the time f() is defined. This is called a closure。
结果:
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
我在对包含负 and/or 十进制字母数字字符串的字符串列表进行排序时遇到问题。这是我目前所拥有的:
import re
format_ids = ["synopsys_SS_2v_-40c_SS.lib",
"synopsys_SS_1v_-40c_SS.lib",
"synopsys_SS_1.2v_-40c_SS.lib",
"synopsys_SS_1.4v_-40c_SS.lib",
"synopsys_SS_2v_-40c_TT.lib",
"synopsys_FF_3v_25c_FF.lib",
"synopsys_TT_4v_125c_TT.lib",
"synopsys_TT_1v_85c_TT.lib",
"synopsys_TT_10v_85c_TT.lib",
"synopsys_FF_3v_-40c_SS.lib",
"synopsys_FF_3v_-40c_TT.lib"]
selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
#key = [2,1,3]
key = 2
produce_groups = False
if isinstance(key, int):
key = [key]
convert = lambda text: float(text) if text.isdigit() else text
alphanum_key = lambda k: [convert(c) for c in re.split('([-.\d]+)', k)]
split_list = lambda name: tuple(alphanum_key(re.findall(selector,name)[0][i]) for i in key)
format_ids.sort(key=split_list)
print "\n".join(format_ids)
我期待以下输出(按第 3 个键排序):
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
但我得到以下信息(所有负数都列在最后):
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
现在,对于第二个键的小数(将键变量更改为 1 (key=1)),我得到:
synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
期待:
synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
非常感谢任何建议。
编辑:我最终使用了@StephenRauch 描述的
import re
def sort_names(format_ids, selector, key=1):
if isinstance(key, int):
key = [key]
SELECTOR_RE = re.compile(selector)
def convert(x):
try:
return float(x[:-1])
except ValueError:
return x
def sort_keys(key):
def split_fid(x):
x = SELECTOR_RE.split(x)
return tuple([convert(x[i]) for i in key])
return split_fid
format_ids.sort(key=sort_keys(key))
format_ids = ["synopsys_SS_2v_-40c_SS.lib",
"synopsys_SS_1v_-40c_SS.lib",
"synopsys_SS_1.2v_-40c_SS.lib",
"synopsys_SS_1.4v_-40c_SS.lib",
"synopsys_SS_2v_-40c_TT.lib",
"synopsys_FF_3v_25c_FF.lib",
"synopsys_TT_4v_125c_TT.lib",
"synopsys_TT_1v_85c_TT.lib",
"synopsys_TT_10v_85c_TT.lib",
"synopsys_FF_3v_-40c_SS.lib",
"synopsys_FF_3v_-40c_TT.lib"]
selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]
sort_names(format_ids,selector,key)
你的问题的很大一部分是只有实际数字被认为是数字,而不是破折号和句点,所以在你的代码中,像 "-40".isdigit() 或 "1.4".isdigit() 这样的东西将是错误的, 并保留为文本而不是转换为浮点数。
需要以不同的方式测试数字,并且 re.split() 被赋予了前导 ''
,这会导致转换例程中断。
固定代码:
key = [2,1,3]
def convert(x):
try:
return float(x)
except ValueError:
return x
alphanum_keys = lambda k: (convert(c) for c in re.split('([-.\d]+)', k))
alphanum_key = lambda k: [i for i in alphanum_keys(k) if i != ''][0]
split_list = lambda name: [
alphanum_key(re.findall(selector, name)[0][i]) for i in key]
format_ids.sort(key=split_list)
备用(更简单)解决方案:
但是...所有这些 lambda 和正则表达式都比您解决此问题所需的复杂得多。怎么样:
def sort_key(keys):
def convert(x):
try:
return float(x[:-1])
except ValueError:
return x
def f(x):
x = x.split('_')
return tuple([convert(x[i]) for i in keys])
return f
format_ids.sort(key=sort_key([3, 2, 4]))
如何?
sort_keys()
returns 函数 f()
。这是传递给 sort()
以评估排序顺序的一个参数的函数。函数 f()
将使用传递给 sort_keys()
的键的值,因为这些是值 available at the time f() is defined. This is called a closure。
结果:
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib