是否有正则表达式来获取所有引号?

Is there a regex to grab all quotation marks?

我知道在正则表达式中,有 \s 匹配所有 whitepsaces (space, tabs ...)\d 匹配任何数字等

是否有相同的快捷方式来匹配所有不同的引号:' " “ ” ‘ ’ „ ” « »?

And more on Wikipedia ...

我可以编写自己的正则表达式,但我可能会遗漏一些其他语言的引号,所以我希望有一种通用的方式来匹配所有引号。

但可能他们被认为是不同的字符所以不可能?

您可以使用正则表达式

['"“”‘’„”«»]

查看 regex101 demo

方法:

如果您不确定所有引号,那么除了引号之外,您可以根据需要编写正则表达式。否则请在此 ['"“”‘’„”«»] 中写下所有可能的引号。

Is there the same shortcut to match all different quotation marks

没有这样的捷径,在 Java ... 或(据我所知)在正则表达式的任何其他方言中。

I can write my own regex, but I will probably miss some quotation marks from other languages, so I like to have a generic way to match all the quotation marks.

遗憾的是,没有包含所有 "quotation" 个字符的 Unicode 字符 class。

并且也没有基于字符名称的简单/有保证的启发式算法。

Java Unicode支持有很详细的支持,连标点符号都分类了。但是不适用于报价。并且有些引号既不是起始引号也不是结束引号。但是您可以收集它们并生成代码。优点:完整性。

    for (int cp = 32; cp <= 0xFFFF; ++cp) {
        String name = Character.getName(cp);
        if(name != null && name.contains("QUOTATION")) {
            System.out.printf("\u%04x = %s (%s %s)%n",
                    cp, name,
                    Character.getType(cp) == Character.INITIAL_QUOTE_PUNCTUATION,
                    Character.getType(cp) == Character.FINAL_QUOTE_PUNCTUATION);
        }
    }

这利用了几乎是字符的代码点。因此不适用于亚洲文字(在 U+FFFF 处停止)。 这导致:

\u0022 = QUOTATION MARK (false false)
\u00ab = LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (true false)
\u00bb = RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (false true)
\u2018 = LEFT SINGLE QUOTATION MARK (true false)
\u2019 = RIGHT SINGLE QUOTATION MARK (false true)
\u201a = SINGLE LOW-9 QUOTATION MARK (false false)
\u201b = SINGLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u201c = LEFT DOUBLE QUOTATION MARK (true false)
\u201d = RIGHT DOUBLE QUOTATION MARK (false true)
\u201e = DOUBLE LOW-9 QUOTATION MARK (false false)
\u201f = DOUBLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u2039 = SINGLE LEFT-POINTING ANGLE QUOTATION MARK (true false)
\u203a = SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (false true)
\u275b = HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275c = HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275d = HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275e = HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275f = HEAVY LOW SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u2760 = HEAVY LOW DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u276e = HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u276f = HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u301d = REVERSED DOUBLE PRIME QUOTATION MARK (false false)
\u301e = DOUBLE PRIME QUOTATION MARK (false false)
\u301f = LOW DOUBLE PRIME QUOTATION MARK (false false)
\uff02 = FULLWIDTH QUOTATION MARK (false false)