匹配 unicode 块或索引范围的正则表达式

Question

我正在尝试创建一个正则表达式来匹配 unicode 块中的任何字符 - 特别是 Mathematical Alphanumeric Symbols 块。

这里的目的是识别使用 Unicode 字符的内容的使用，以便在其文本上获得不同的格式，例如通常不支持的粗体或斜体文本。有很多网站，like this one 可以帮助用户转换文本。

我试过使用 shorthand 属性代码，但它似乎与我期望的块中的所有字符都不匹配。

preg_match('/\p{Sm}/i', '') === 1; // false

似乎 PHP 也不支持命名变体，所以我不能做类似 \p{Math}.

的事情

我认为我需要定位块范围 - 从 U+1D400 - U+1D7FF，但我不知道如何正确构建此正则表达式。这就是我认为我可以让它工作的方式，但它似乎没有工作。

preg_match('/\x{1D400}-\x{1D7FF}/i', '') === 1; // false

我希望这些字符中有 none 个匹配（直接在我的键盘上输入）：

abcdefghijklmnopqrstuvwxyz0123456789

我希望这些字符中的每一个都匹配（与上面相同，使用上面的 link 转换为数学粗体）：

Answer 1

我猜这个表达式可能有效，但不确定:

$re = '/[\x{1D400}-\x{1D7FF}]+/su';
$str = '';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

\p{S} or \p{Symbol}: math symbols, currency signs, dingbats, box-drawing characters, etc.
\p{Sm} or \p{Math_Symbol}: any mathematical symbol.
\p{Sc} or \p{Currency_Symbol}: any currency sign.
\p{Sk} or \p{Modifier_Symbol}: a combining character (mark) as a full character on its own.
\p{So} or \p{Other_Symbol}: various symbols that are not math symbols, currency signs, or combining characters.

表达式在 regex101.com, if you wish to explore/simplify/modify it, and in this link 的右上面板进行了解释，如果您愿意，您可以观察它如何与一些示例输入匹配。

参考

Unicode Regular Expressions

匹配 unicode 块或索引范围的正则表达式

Regular expression to match unicode block, or index range

php

regex

unicode

pcre

参考