为什么我的正则表达式没有捕捉到这个日文网页的组?
Why my regex does not catch the group with this Japanese Web page?
我希望从这个日语网站页面中获取 go:image 属性 内容,使用 UTF-8 文本编码。
期望的结果是:
http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg
但我得到:
jp//blog/archives/001/201701/5871bd1be125c.jpg" />
我认为这个问题与范围的使用有关。
正则可以参考这个:https://regex101.com/r/F29INt/1
html代码段如下:
<meta name="description" content="CES2017において、OtterBoxが、様々なモジュールを装着出来るモジュール式iPhoneケース「uniVERSE」の展示を行っていました。 背面にあるスライド式「uniVERSEケースシステム」を使用して、背面の下半分を変更す..." />
<meta property="og:image" name="og:image" content="http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg" />
<meta name="twitter:image"
我的正则表达式 class 如下:
public class Regex {
let regex: NSRegularExpression
let pattern: String
public init(_ pattern: String) {
self.pattern = pattern
regex = try! NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
}
public func matches(_ input: String) -> [NSTextCheckingResult] {
let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
return matches
}
}
而我使用的代码如下:
let pattern = "<meta[^>]+property=[\"']\(property)[\"'][^>]+content=[\"']([^\"']*)[\"'][^>]*>"
let regex = Regex(pattern)
let matches = regex.matches(html)
for match in matches {
// range at index 0: full match
// range at index 1: first capture group
var text = ""
text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
for j in 1..<match.numberOfRanges {
text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), range=\(match.rangeAt(j)), is \(html[match.rangeAt(j)])"
}
}
print(text)
我得到:
+++StoryPreviewCache.getMetaPropertyContent():
with pattern=<meta[^>]+property=["']og:image["'][^>]+content=["']([^"']*)["'][^>]*>
for prop=og:image
+++StoryPreviewCache.getMetaPropertyContent():
Groups 1,
range=__C._NSRange,
is jp//blog/archives/001/201701/5871bd1be125c.jpg" />
根据 Martin R 提出的 so 问题,我写了这个扩展:
extension NSTextCheckingResult {
public func capture(group:Int, in text:String) -> String {
let range = self.rangeAt(group)
let content = (text as NSString).substring(with: range)
return content as String
}
}
并在 Regex 中修改我的代码如下:
public func matches(_ input: String) -> [NSTextCheckingResult] {
let nsString = input as NSString
let matches = regex.matches(in: input, range: NSRange(location: 0, length: nsString.length))
// former code as follows
//let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
return matches
}
现在我这样使用它:
for match in matches {
var text = ""
text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
for j in 1..<match.numberOfRanges {
text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), is \(match.capture(group:j, in: html))"
}
}
我希望从这个日语网站页面中获取 go:image 属性 内容,使用 UTF-8 文本编码。
期望的结果是:
http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg
但我得到:
jp//blog/archives/001/201701/5871bd1be125c.jpg" />
我认为这个问题与范围的使用有关。
正则可以参考这个:https://regex101.com/r/F29INt/1
html代码段如下:
<meta name="description" content="CES2017において、OtterBoxが、様々なモジュールを装着出来るモジュール式iPhoneケース「uniVERSE」の展示を行っていました。 背面にあるスライド式「uniVERSEケースシステム」を使用して、背面の下半分を変更す..." />
<meta property="og:image" name="og:image" content="http://www.macotakara.jp//blog/archives/001/201701/5871de9fb4929.jpg" />
<meta name="twitter:image"
我的正则表达式 class 如下:
public class Regex {
let regex: NSRegularExpression
let pattern: String
public init(_ pattern: String) {
self.pattern = pattern
regex = try! NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
}
public func matches(_ input: String) -> [NSTextCheckingResult] {
let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
return matches
}
}
而我使用的代码如下:
let pattern = "<meta[^>]+property=[\"']\(property)[\"'][^>]+content=[\"']([^\"']*)[\"'][^>]*>"
let regex = Regex(pattern)
let matches = regex.matches(html)
for match in matches {
// range at index 0: full match
// range at index 1: first capture group
var text = ""
text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
for j in 1..<match.numberOfRanges {
text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), range=\(match.rangeAt(j)), is \(html[match.rangeAt(j)])"
}
}
print(text)
我得到:
+++StoryPreviewCache.getMetaPropertyContent():
with pattern=<meta[^>]+property=["']og:image["'][^>]+content=["']([^"']*)["'][^>]*>
for prop=og:image
+++StoryPreviewCache.getMetaPropertyContent():
Groups 1,
range=__C._NSRange,
is jp//blog/archives/001/201701/5871bd1be125c.jpg" />
根据 Martin R 提出的 so 问题,我写了这个扩展:
extension NSTextCheckingResult {
public func capture(group:Int, in text:String) -> String {
let range = self.rangeAt(group)
let content = (text as NSString).substring(with: range)
return content as String
}
}
并在 Regex 中修改我的代码如下:
public func matches(_ input: String) -> [NSTextCheckingResult] {
let nsString = input as NSString
let matches = regex.matches(in: input, range: NSRange(location: 0, length: nsString.length))
// former code as follows
//let matches = regex.matches(in: input, options: [], range:NSRange(location:0, length:input.characters.count))
return matches
}
现在我这样使用它:
for match in matches {
var text = ""
text += "+++StoryPreviewCache.getMetaPropertyContent(): with pattern=\(pattern) for prop=\(property)"
for j in 1..<match.numberOfRanges {
text += "+++StoryPreviewCache.getMetaPropertyContent(): Groups \(j), is \(match.capture(group:j, in: html))"
}
}