正则表达式获取所有字母字符

Question

我想要像 [A-z] 这样的东西，它可以计算所有字母字符以及 ö、ä、ü 等东西

如果我这样做 [A-ü] 我得到可能拉丁语言使用的所有特殊字符，但它也允许其他东西，如 ¿¿]|{}[¢§øæ¬°µ©¥

编辑： 我在 python2.

中需要这个

Answer 1

根据您使用的正则表达式引擎，您可以使用 ^\p{L}+$ 正则表达式。 \p{L} 表示一个 unicode 字母：

In addition to complications, Unicode also brings new possibilities. One is that each Unicode character belongs to a certain category. You can match a single character belonging to the "letter" category with \p{L}

Source

This 例子应该能说明我在说什么。看起来 Regex101 上的正则表达式引擎确实支持这个，你只需要从左上角 select PCRE (PHP)。

Answer 2

当你使用 [A-z] 时，你不仅捕获了从 "A" 到 "z" 的字母，你还捕获了一些更多的非字母字符：[ \ ] ^ _ `.

在 Python 中，您可以使用 [^\W\d_] 和 re.U 选项来匹配 Unicode 字符（参见 this post）。

Here is a sample 基于您输入的字符串。

Python 示例：

import re
r = re.search(
    r'(?P<unicode_word>[^\W\d_]*)',
    u'TestöäüéàèÉÀÈéàè',
    re.U
)

print r.group('unicode_word')
>>> TestöäüéàèÉÀÈéàè

正则表达式获取所有字母字符

Regex Get All Alphabetic characters

python

regex

special-characters