从 elisp 中的字符串中删除大部分非字母字符

Question

我需要从字符串中删除除 - 和 _

之外的所有非字母字符和数字

许多语言的流行解决方案是使用类似这样的东西[^\w\-_]出于某种原因，当与 replace-regexp-in-string 一起使用时，这个表达式会删除所有内容。
虽然 \W 按预期删除了除字母字符和数字以外的所有内容：

(message (replace-regexp-in-string "\W" "" "Set AA053 Лыв № foo_bar (设）"))

将输出：SetAA053Лывfoobar设

a-zA-Z0-9 无法解决我的问题，因为我需要保留非拉丁字符。

谢谢！

Answer 1

在@wiktorstribiżew 的帮助下，我找到了正确的正则表达式：

[^[:alnum:]-_]

详情见Character Classes。

Answer 2

POSIX class 是特定于语言环境的，并且根据 documentation、

‘[:alnum:]’
This matches any letter or digit. (At present, for multibyte characters, it matches anything that has word syntax.)
‘[:alpha:]’
This matches any letter. (At present, for multibyte characters, it matches anything that has word syntax.)

这就是为什么要匹配任何不是字母、数字或 underscore/hyphen 的字符，您可以使用 negated character class 解决方案：

Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class.

所以，是的，你可以使用

"[^[:alnum:]_-]"
 ^^           ^

或

"[^[:alpha:][:digit:]_-]"

字符 class 末尾的连字符被正则表达式引擎视为文字连字符，而不是任何范围定义运算符。

如果你不关心_，想替换它，从字符class中去掉。

从 elisp 中的字符串中删除大部分非字母字符

Remove most of non-alphabetic characters from a string in elisp

regex

elisp

谢谢！