任何单个 UTF-8 字符的模式匹配

Question

我想要一个匹配任何单个 UTF-8 字符的函数子句。

我可以像这样匹配特定字符

def foo("a") do
  "It's an a"
end

但我无法确定是否可以对任何单个 UTF8 字符执行相同的操作。

我目前的解决方案是将字符串拆分为一个字符列表并对其进行模式匹配，但我很好奇是否可以跳过该步骤。

Answer 1

来自 the Regex docs:

The modifiers available when creating a Regex are: ...

unicode (u) - enables Unicode specific patterns like \p and changes modifiers like \w, \W, \s and friends to also match on Unicode. It expects valid Unicode strings to be given on match

dotall (s) - causes dot to match newlines and also set newline to anycrlf; the new line setting can be overridden by setting (*CR) or (*LF) or (*CRLF) or (*ANY) according to :re documentation

所以你可以试试： ~r/./us

来自http://elixir-lang.org/crash-course.html

In Elixir, the word string means a UTF-8 binary and there is a String module that works on such data

所以我认为你应该可以开始了。

Answer 2

长话短说：

for <<char <- "abc">> do
  def foo(unquote(<<char>>)), do: "It's an #{unquote(<<char>>)}"
end

看看 https://github.com/elixir-lang/elixir/blob/3eb938a0ba7db5c6cc13d390e6242f66fdc9ef00/lib/elixir/unicode/unicode.ex#L48-L52 您可以在编译时为二进制中的每个字符生成函数（在我的示例中为 "abc"）。这就是 Elixir unicode 支持的工作原理，查看整个模块以更好地理解。

Answer 3

您可以使用：

def char?(<<c::utf8>>), do: true
def char?(_), do: false

请注意，这仅匹配具有单个字符的二进制文件，要匹配字符串中的下一个字符，您可以这样做：

def char?(<<c::utf8, _rest::binary>>), do: true

任何单个 UTF-8 字符的模式匹配

Pattern match on any single UTF-8 character

utf-8

binary

pattern-matching

elixir