搜索字符串返回多个可变长度 Ruby 正则表达式的所有重叠出现的索引
Search string returning indexes of all overlapping occurrences for multiple variable length Ruby regular expressions
我在 Interactive Ruby (IRB) $ irb
中使用以下代码来搜索字符串(即 evidence
)和 return 元组数组(guilty_term_indexes
).每个元组的第二个元素表示 evidence
字符串中的字符索引,其中已找到 guilty_term
的第一个字符(存储在元组的第一个元素中)。
guilty_terms = [/danger/i, /hack/i, /ckdd/i]
regex_guilty_terms = Regexp.union(guilty_terms)
evidence = "hackddangerhackdanger"
guilty_terms_and_indexes = []
evidence.scan(regex_guilty_terms) do |index|
guilty_term = Regexp.last_match.offset(0)[0]
guilty_terms_and_indexes << [index, guilty_term]
end
p guilty_terms_and_indexes
我预期guilty_terms_and_indexes
到return:
[["hack", 0], ["ckdd", 2], ["danger", 5], ["hack", 11], ["ckdd", 13], ["danger", 15]]
但取而代之的是 returns:
[["hack", 0], ["danger", 5], ["hack", 11], ["danger", 15]]
如何获得预期的结果?
系统
$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
比赛不能重叠。改用零长度断言:
guilty_terms = [/danger/i, /hack/i, /ckdd/i]
# the positive lookahead is where the magic happens
regex_guilty_terms = /(?=(#{Regexp.union(guilty_terms)}))/
evidence = "hackddangerhackdanger"
# just a squeezin'
[].tap { |arr| evidence.scan(regex_guilty_terms) { |x| arr << [, $~.begin(1)] } }
# => [["hack", 0], ["ckdd", 2], ["danger", 5], ["hack", 11], ["danger", 15]]
没有打印位置 13,因为它实际上不匹配,所以...不确定您如何获得预期结果:)
您不必使用正则表达式。
terms = %w| Danger hack ckdd |
#=> ["Danger", "hack", "ckdd"]
evidence = "Hackddangerhackddanger"
down_terms = terms.map(&:downcase)
#=> ["danger", "hack", "ckdd"]
down_evidence = evidence.downcase
#=> "hackddangerhackddanger"
down_evidence.size.times.with_object([]) do |i,a|
w = down_terms.find { |w| down_evidence[i..-1].start_with?(w) }
a << [w,i] unless w.nil?
end
# => [["hack",0], ["ckdd",2], ["danger",5], ["hack",11], ["ckdd",13], ["danger",16]]
我在 Interactive Ruby (IRB) $ irb
中使用以下代码来搜索字符串(即 evidence
)和 return 元组数组(guilty_term_indexes
).每个元组的第二个元素表示 evidence
字符串中的字符索引,其中已找到 guilty_term
的第一个字符(存储在元组的第一个元素中)。
guilty_terms = [/danger/i, /hack/i, /ckdd/i]
regex_guilty_terms = Regexp.union(guilty_terms)
evidence = "hackddangerhackdanger"
guilty_terms_and_indexes = []
evidence.scan(regex_guilty_terms) do |index|
guilty_term = Regexp.last_match.offset(0)[0]
guilty_terms_and_indexes << [index, guilty_term]
end
p guilty_terms_and_indexes
我预期guilty_terms_and_indexes
到return:
[["hack", 0], ["ckdd", 2], ["danger", 5], ["hack", 11], ["ckdd", 13], ["danger", 15]]
但取而代之的是 returns:
[["hack", 0], ["danger", 5], ["hack", 11], ["danger", 15]]
如何获得预期的结果?
系统
$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
比赛不能重叠。改用零长度断言:
guilty_terms = [/danger/i, /hack/i, /ckdd/i]
# the positive lookahead is where the magic happens
regex_guilty_terms = /(?=(#{Regexp.union(guilty_terms)}))/
evidence = "hackddangerhackdanger"
# just a squeezin'
[].tap { |arr| evidence.scan(regex_guilty_terms) { |x| arr << [, $~.begin(1)] } }
# => [["hack", 0], ["ckdd", 2], ["danger", 5], ["hack", 11], ["danger", 15]]
没有打印位置 13,因为它实际上不匹配,所以...不确定您如何获得预期结果:)
您不必使用正则表达式。
terms = %w| Danger hack ckdd |
#=> ["Danger", "hack", "ckdd"]
evidence = "Hackddangerhackddanger"
down_terms = terms.map(&:downcase)
#=> ["danger", "hack", "ckdd"]
down_evidence = evidence.downcase
#=> "hackddangerhackddanger"
down_evidence.size.times.with_object([]) do |i,a|
w = down_terms.find { |w| down_evidence[i..-1].start_with?(w) }
a << [w,i] unless w.nil?
end
# => [["hack",0], ["ckdd",2], ["danger",5], ["hack",11], ["ckdd",13], ["danger",16]]