解析地址的正则表达式
Regular Expressions to parse addresses
我正在尝试学习如何使用正则表达式来解析 location/address 字符串。
不幸的是,我得到的数据与大多数地址的写入方式不一致且非常规。以下是我目前所拥有的,我遇到的问题是我需要多次解析字符串以将其归结为正确的格式。
以下面的字符串为例:102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
我想要的最终结果是110 Spruce, Greenwood, SC 29649
代码:
l = nil
location_str = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
1.upto(4).each do |attempt|
l = Location.from_string(location_str)
puts "TRYING: #{location_str}"
break if !l.nil?
location_str.gsub!(/^[^,:\-]+\s*/, '')
end
输出:
TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
预期:
TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 110 Spruce, Greenwood, SC 29649
假设格式为:
"Stuff you aren't interested in, more stuff, more stuff, etc., house, city, state zip"
然后您只需使用美元符号锚定到字符串的末尾即可获取最后 3 个部分:
location_str[/[^,]*,[^,]*,[^,]*$/]
没有正则表达式的尝试:
address = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
elements = address.split(",").map(&:strip)
city, state_and_zip = elements[elements.length-2..-1]
addresses = elements[0...elements.length-2]
p [addresses.last, city, state_and_zip].join(",")
输出:
"110 Spruce,Greenwood,SC 29649"
这是不止一种方法的事情之一。还有一个:
def address_from_location_string(location)
*_, address, city, state_zip = location.split(/\s*,\s*/)
"#{address}, #{city}, #{state_zip}"
end
address_from_location_string("102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649")
# => "110 Spruce, Greenwood, SC 29649"
我正在尝试学习如何使用正则表达式来解析 location/address 字符串。 不幸的是,我得到的数据与大多数地址的写入方式不一致且非常规。以下是我目前所拥有的,我遇到的问题是我需要多次解析字符串以将其归结为正确的格式。
以下面的字符串为例:102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
我想要的最终结果是110 Spruce, Greenwood, SC 29649
代码:
l = nil
location_str = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
1.upto(4).each do |attempt|
l = Location.from_string(location_str)
puts "TRYING: #{location_str}"
break if !l.nil?
location_str.gsub!(/^[^,:\-]+\s*/, '')
end
输出:
TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: , 108 Spruce, 110 Spruce, Greenwood, SC 29649
预期:
TRYING: 102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 108 Spruce, 110 Spruce, Greenwood, SC 29649
TRYING: 110 Spruce, Greenwood, SC 29649
假设格式为:
"Stuff you aren't interested in, more stuff, more stuff, etc., house, city, state zip"
然后您只需使用美元符号锚定到字符串的末尾即可获取最后 3 个部分:
location_str[/[^,]*,[^,]*,[^,]*$/]
没有正则表达式的尝试:
address = "102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649"
elements = address.split(",").map(&:strip)
city, state_and_zip = elements[elements.length-2..-1]
addresses = elements[0...elements.length-2]
p [addresses.last, city, state_and_zip].join(",")
输出:
"110 Spruce,Greenwood,SC 29649"
这是不止一种方法的事情之一。还有一个:
def address_from_location_string(location)
*_, address, city, state_zip = location.split(/\s*,\s*/)
"#{address}, #{city}, #{state_zip}"
end
address_from_location_string("102 Spruce, 108 Spruce, 110 Spruce, Greenwood, SC 29649")
# => "110 Spruce, Greenwood, SC 29649"