ruby 字母数字排序未按预期工作
ruby alphanumeric sort not working as expected
给定以下数组:
y = %w[A1 A2 B5 B12 A6 A8 B10 B3 B4 B8]
=> ["A1", "A2", "B5", "B12", "A6", "A8", "B10", "B3", "B4", "B8"]
预期的排序数组为:
=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
使用以下(原始)排序,我得到:
irb(main):2557:0> y.sort{|a,b| puts "%s <=> %s = %s\n" % [a, b, a <=> b]; a <=> b}
A1 <=> A8 = -1
A8 <=> B8 = -1
A2 <=> A8 = -1
B5 <=> A8 = 1
B4 <=> A8 = 1
B3 <=> A8 = 1
B10 <=> A8 = 1
B12 <=> A8 = 1
A6 <=> A8 = -1
A1 <=> A2 = -1
A2 <=> A6 = -1
B12 <=> B3 = -1
B3 <=> B8 = -1
B5 <=> B3 = 1
B4 <=> B3 = 1
B10 <=> B3 = -1 # this appears to be wrong, looks like 1 is being compared, not 10.
B12 <=> B10 = 1
B5 <=> B4 = 1
B4 <=> B8 = -1
B5 <=> B8 = -1
=> ["A1", "A2", "A6", "A8", "B10", "B12", "B3", "B4", "B5", "B8"]
...这显然不是我想要的。我知道我可以尝试先拆分 alpha,然后对数字进行排序,但似乎我不应该那样做。
可能的重要警告:我们现在无法使用 Ruby 1.8.7 :( 但即使 Ruby 2.0.0 也在做同样的事情。我在这里错过了什么?
建议?
需要自然排序或字典排序,而不是标准的基于字符值的排序。像这些宝石这样的东西将是一个起点:https://github.com/dogweather/naturally, https://github.com/johnnyshields/naturalsort
人类将 "A2" 之类的字符串视为 "A" 后跟数字 2,然后对字符串部分使用字符串排序,对数字部分使用数字排序进行排序。标准 sort()
使用字符值排序将字符串视为字符序列,而不管字符是什么。所以 sort()
"A10" 和 "A2" 看起来像 [ 'A', '1', '0' ] 和 [ 'A', '2' ],因为 ' 1' 在 '2' 之前排序,并且后面的字符不能更改该顺序 "A10" 因此在 "A2" 之前排序。对于人类,相同的字符串看起来像 [ "A", 10 ] 和 [ "A", 2 ], 10 在 2 之后排序,所以我们得到相反的结果。可以操纵字符串以使基于字符值的 sort()
产生预期结果,方法是使数字部分固定宽度并在左侧填充零以避免嵌入空格,使 "A2" 转进入 "A02",它使用标准 sort()
.
在 "A10" 之前排序
您正在对字符串进行排序。字符串像字符串一样排序,而不像数字。如果你想像数字一样排序,那么你应该对数字而不是字符串进行排序。字符串 'B10'
按字典顺序小于字符串 'B3'
,这不是 Ruby 独有的东西,甚至不是编程独有的东西,这就是按字典顺序对一段文本进行排序的方式几乎无处不在, 在编程、数据库、词典、词典、电话簿等方面
您应该将字符串拆分为数字和非数字部分,并将数字部分转换为数字。数组排序是字典顺序的,所以这将最终排序完全正确:
y.sort_by {|s| # use `sort_by` for a keyed sort, not `sort`
s.
split(/(\d+)/). # split numeric parts from non-numeric
map {|s| # the below parses numeric parts as decimals, ignores the rest
begin Integer(s, 10); rescue ArgumentError; s end }}
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
如果您知道您的号码中的最大位数是多少,您也可以在比较期间为您的号码添加前缀 0
。
y.sort_by { |string| string.gsub(/\d+/) { |digits| format('%02d', digits.to_i) } }
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
这里'%02d'
指定如下,%
表示值的格式,0
则指定给数字加上0
前缀,2
指定数字的总长度,d
指定您希望以小数(基数 10)输出。您可以找到更多信息 here.
这意味着 'A1'
将转换为 'A01'
,'B8'
将变为 'B08'
而 'B12'
将保留 'B12'
,因为它已经有 2 位数。这仅在比较期间使用。
这里有几种方法可以做到这一点。
arr = ["A1", "A2", "B5", "B12", "A6", "AB12", "A8", "B10", "B3", "B4",
"B8", "AB2"]
按 2 元素数组排序
arr.sort_by { |s| [s[/\D+/], s[/\d+/].to_i] }
#=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
# "B10", "B12"]
这类似于@Jorg 的解决方案,只是我分别计算了比较数组的两个元素,而不是将字符串分成两部分并将后者转换为整数。
使用 Enumerable#sort_by compares each pair of elements of arr
with the spaceship method, <=>
. As the elements being compared are arrays, the method Array#<=>。请特别参阅该文档的第三段。
sort_by
比较以下 2 元素数组:
arr.each { |s| puts "%s-> [%s, %d]" %
["\"#{s}\"".ljust(7), "\"#{s[/\D+/]}\"".ljust(4), s[/\d+/].to_i] }
"A1" -> ["A" , 1]
"A2" -> ["A" , 2]
"B5" -> ["B" , 5]
"B12" -> ["B" , 12]
"A6" -> ["A" , 6]
"AB12" -> ["AB", 12]
"A8" -> ["A" , 8]
"B10" -> ["B" , 10]
"B3" -> ["B" , 3]
"B4" -> ["B" , 4]
"B8" -> ["B" , 8]
"AB2" -> ["AB", 2]
在字符串的字母数字部分之间插入空格
max_len = arr.max_by(&:size).size
#=> 4
arr.sort_by { |s| "%s%s%d" % [s[/\D+/], " "*(max_len-s.size), s[/\d+/].to_i] }
#=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
# "B10", "B12"]
此处sort_by
比较以下字符串:
arr.each { |s| puts "%s-> \"%s\"" %
["\"#{s}\"".ljust(7), s[/\D+/] + " "*(max_len-s.size) + s[/\d+/]] }
"A1" -> "A 1"
"A2" -> "A 2"
"B5" -> "B 5"
"B12" -> "B 12"
"A6" -> "A 6"
"AB12" -> "AB12"
"A8" -> "A 8"
"B10" -> "B 10"
"B3" -> "B 3"
"B4" -> "B 4"
"B8," -> "B 8"
"AB2" -> "AB 2"
给定以下数组:
y = %w[A1 A2 B5 B12 A6 A8 B10 B3 B4 B8]
=> ["A1", "A2", "B5", "B12", "A6", "A8", "B10", "B3", "B4", "B8"]
预期的排序数组为:
=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
使用以下(原始)排序,我得到:
irb(main):2557:0> y.sort{|a,b| puts "%s <=> %s = %s\n" % [a, b, a <=> b]; a <=> b}
A1 <=> A8 = -1
A8 <=> B8 = -1
A2 <=> A8 = -1
B5 <=> A8 = 1
B4 <=> A8 = 1
B3 <=> A8 = 1
B10 <=> A8 = 1
B12 <=> A8 = 1
A6 <=> A8 = -1
A1 <=> A2 = -1
A2 <=> A6 = -1
B12 <=> B3 = -1
B3 <=> B8 = -1
B5 <=> B3 = 1
B4 <=> B3 = 1
B10 <=> B3 = -1 # this appears to be wrong, looks like 1 is being compared, not 10.
B12 <=> B10 = 1
B5 <=> B4 = 1
B4 <=> B8 = -1
B5 <=> B8 = -1
=> ["A1", "A2", "A6", "A8", "B10", "B12", "B3", "B4", "B5", "B8"]
...这显然不是我想要的。我知道我可以尝试先拆分 alpha,然后对数字进行排序,但似乎我不应该那样做。
可能的重要警告:我们现在无法使用 Ruby 1.8.7 :( 但即使 Ruby 2.0.0 也在做同样的事情。我在这里错过了什么?
建议?
需要自然排序或字典排序,而不是标准的基于字符值的排序。像这些宝石这样的东西将是一个起点:https://github.com/dogweather/naturally, https://github.com/johnnyshields/naturalsort
人类将 "A2" 之类的字符串视为 "A" 后跟数字 2,然后对字符串部分使用字符串排序,对数字部分使用数字排序进行排序。标准 sort()
使用字符值排序将字符串视为字符序列,而不管字符是什么。所以 sort()
"A10" 和 "A2" 看起来像 [ 'A', '1', '0' ] 和 [ 'A', '2' ],因为 ' 1' 在 '2' 之前排序,并且后面的字符不能更改该顺序 "A10" 因此在 "A2" 之前排序。对于人类,相同的字符串看起来像 [ "A", 10 ] 和 [ "A", 2 ], 10 在 2 之后排序,所以我们得到相反的结果。可以操纵字符串以使基于字符值的 sort()
产生预期结果,方法是使数字部分固定宽度并在左侧填充零以避免嵌入空格,使 "A2" 转进入 "A02",它使用标准 sort()
.
您正在对字符串进行排序。字符串像字符串一样排序,而不像数字。如果你想像数字一样排序,那么你应该对数字而不是字符串进行排序。字符串 'B10'
按字典顺序小于字符串 'B3'
,这不是 Ruby 独有的东西,甚至不是编程独有的东西,这就是按字典顺序对一段文本进行排序的方式几乎无处不在, 在编程、数据库、词典、词典、电话簿等方面
您应该将字符串拆分为数字和非数字部分,并将数字部分转换为数字。数组排序是字典顺序的,所以这将最终排序完全正确:
y.sort_by {|s| # use `sort_by` for a keyed sort, not `sort`
s.
split(/(\d+)/). # split numeric parts from non-numeric
map {|s| # the below parses numeric parts as decimals, ignores the rest
begin Integer(s, 10); rescue ArgumentError; s end }}
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
如果您知道您的号码中的最大位数是多少,您也可以在比较期间为您的号码添加前缀 0
。
y.sort_by { |string| string.gsub(/\d+/) { |digits| format('%02d', digits.to_i) } }
#=> ["A1", "A2", "A6", "A8", "B3", "B4", "B5", "B8", "B10", "B12"]
这里'%02d'
指定如下,%
表示值的格式,0
则指定给数字加上0
前缀,2
指定数字的总长度,d
指定您希望以小数(基数 10)输出。您可以找到更多信息 here.
这意味着 'A1'
将转换为 'A01'
,'B8'
将变为 'B08'
而 'B12'
将保留 'B12'
,因为它已经有 2 位数。这仅在比较期间使用。
这里有几种方法可以做到这一点。
arr = ["A1", "A2", "B5", "B12", "A6", "AB12", "A8", "B10", "B3", "B4",
"B8", "AB2"]
按 2 元素数组排序
arr.sort_by { |s| [s[/\D+/], s[/\d+/].to_i] }
#=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
# "B10", "B12"]
这类似于@Jorg 的解决方案,只是我分别计算了比较数组的两个元素,而不是将字符串分成两部分并将后者转换为整数。
使用Enumerable#sort_by compares each pair of elements of arr
with the spaceship method, <=>
. As the elements being compared are arrays, the method Array#<=>。请特别参阅该文档的第三段。
sort_by
比较以下 2 元素数组:
arr.each { |s| puts "%s-> [%s, %d]" %
["\"#{s}\"".ljust(7), "\"#{s[/\D+/]}\"".ljust(4), s[/\d+/].to_i] }
"A1" -> ["A" , 1]
"A2" -> ["A" , 2]
"B5" -> ["B" , 5]
"B12" -> ["B" , 12]
"A6" -> ["A" , 6]
"AB12" -> ["AB", 12]
"A8" -> ["A" , 8]
"B10" -> ["B" , 10]
"B3" -> ["B" , 3]
"B4" -> ["B" , 4]
"B8" -> ["B" , 8]
"AB2" -> ["AB", 2]
在字符串的字母数字部分之间插入空格
max_len = arr.max_by(&:size).size
#=> 4
arr.sort_by { |s| "%s%s%d" % [s[/\D+/], " "*(max_len-s.size), s[/\d+/].to_i] }
#=> ["A1", "A2", "A6", "A8", "AB2", "AB12", "B3", "B4", "B5", "B8",
# "B10", "B12"]
此处sort_by
比较以下字符串:
arr.each { |s| puts "%s-> \"%s\"" %
["\"#{s}\"".ljust(7), s[/\D+/] + " "*(max_len-s.size) + s[/\d+/]] }
"A1" -> "A 1"
"A2" -> "A 2"
"B5" -> "B 5"
"B12" -> "B 12"
"A6" -> "A 6"
"AB12" -> "AB12"
"A8" -> "A 8"
"B10" -> "B 10"
"B3" -> "B 3"
"B4" -> "B 4"
"B8," -> "B 8"
"AB2" -> "AB 2"