解析 \p{IsBasicLatin} 字符属性时 Ruby 出现 RegexpError

Question

我正在使用 JRuby 1.7.18，甚至在 JRuby 9000（最新版本）中尝试过此操作，但我遇到了同样的错误。我正在使用 soap-4r 和 nokogiri 库来解析 wsdl xml 文件。

wsdl 的以下部分被解析时

<xs:pattern value="[\p{IsBasicLatin}]*"/>

我收到以下错误

RegexpError: (RegexpError) invalid character property name <IsBasicLatin>: /\A[\p{IsBasicLatin}]*\z/n
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'

在 Ruby 1.9 中，它是 JRuby 1.7.18 兼容的 Ruby 版本之一，我读到像 \p{IsBasicLatin} 这样的字符块是不支持。但支持像 \p{Latin} 这样的脚本。我试过将 IsBasicLatin 更改为 Latin，甚至尝试了其他一些，例如 InBasicLatin 和 InBasic_Latin，但它们都 return 相同的错误。

这在 JRuby 1.7.18 和最新版本的 JRuby 9000 中都有。

这里出了什么问题，我该如何解决？

Answer 1

如评论中所述，角色名称属性实际上是 In_Basic_Latin 而不是 IsBasicLatin。 Ruby 的现代版本（具体来说是 MRI 或 CRuby）使用正则表达式库 Onigmo。官方 Ruby 文档没有列出所有 Unicode 属性，但幸运的是 Onigmo does.

显然 JRuby 似乎没有实现（至少）Unicode 块。然而，关于方块的信息（名称和范围）是 publicly accessible。 \p{In_Basic_Latin} 因此等同于 [\u0000-\u007F]。 [[:ascii:]].

也是

解析 \p{IsBasicLatin} 字符属性时 Ruby 出现 RegexpError

RegexpError in Ruby when parsing \p{IsBasicLatin} character property

ruby

regex

jruby

soap4r

nokogiri

解析 \p{IsBasicLatin} 字符 属性 时 Ruby 出现 RegexpError

RegexpError in Ruby when parsing \p{IsBasicLatin} character property

ruby

regex

jruby

soap4r

nokogiri

解析 \p{IsBasicLatin} 字符属性时 Ruby 出现 RegexpError