如何使用正则表达式替换 '\U'？

Question

问题很简单。我试图在整个字符串向量中替换 "\U"，为此我使用了包 {stringr}，但我在匹配模式时遇到了问题。

text <- "\U0001f517"

stringr::str_detect(text, "\U")
#> Error: '\U' used without hex digits in character string starting ""\U"

stringr::str_detect(text, "\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
#>   Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\U`)

stringr::str_detect(text, "\\U")
#> Error: '\U' used without hex digits in character string starting ""\\U"

stringr::str_detect(text, "\\U")
#> FALSE

stringr::str_detect(text, "\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\U"

stringr::str_detect(text, "\\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
#>   Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\\U`)

stringr::str_detect(text, "\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\U"

# ... you get the idea

据我所知，这个问题是因为正则表达式引擎将 "\U" 视为指示新十六进制代码的开始，如第一个错误所示。其他字符工作正常：

text <- "\a0001f517"

stringr::str_detect(text, "\a")
#> TRUE

我看到了关于这个问题的其他问题，例如，但仍然无法正常工作。谁能给我一个有效的正则表达式？

Answer 1

\U 在你的 text <- "\U0001f517" 不是一个单独的字符序列，它是 Unicode 字符代码点表示法的一部分。 text 变量中的文字实际上是 </code>，您可以使用 <code>cat(text).[=21 轻松检查=]

相反，"\a"是单个字符（一个"Bell" character) that can also be written as "\u0007" or "\x07" (run "\a" == '\x07' and you will see that the output is TRUE). See more about string escape sequences syntax.

在 R 中，要将底层字符串文字作为文字字符串，您可以使用

text <- "\U0001f517"
cat(text)
## =>  

library("utf8")
text <- utf8_encode(text)
cat(text)
## => \U0001f517

如何使用正则表达式替换 '\U'？

How can I replace '\U' using regular expressions?

regex

string

r

emoji

stringr