删除 R 中欧元符号后的字符

Question

我在 "euro" 变量中保存了一个欧元符号：

euro <- "\u20AC"
euro
#[1] "€"

并且 "eurosearch" 变量包含 "services as defined in this SOW at a price of € 15,896.80 (if executed fro" .

eurosearch
[1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro"

我想要欧元符号后的字符“15,896.80（如果执行来回”）我正在使用此代码：

gsub("^.*[euro]","",eurosearch)

但我得到的结果是空的。如何获得预期的输出？

Answer 1

使用基数 r 中存在的正则匹配或 stringr 中的 str_extarct，等等

> x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
> regmatches(x, regexpr("(?<=€ )\S+", x, perl=T))
[1] "15,896.80"

或

> gsub("€ (\S+)|.", "\1", x)
[1] "15,896.80"

或

使用变量。

euro <- "\u20AC"
gsub(paste(euro , "(\S+)|."), "\1", x)

如果这个使用变量的答案对你不起作用，那么你需要设置编码，

gsub(paste(euro , "(\S+)|."), "\1", `Encoding<-`(x, "UTF8"))

Answer 2

您只需使用 paste0:

连接字符串即可在模式中使用变量

euro <- "€"
eurosearch <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
sub(paste0("^.*", gsub("([^A-Za-z_0-9])", "\\\1", euro), "\s*(\S+).*"), "\1", eurosearch)

euro <- "$"
eurosearch <- "services as defined in this SOW at a price of $ 25,196.4 (if executed fro"
sub(paste0("^.*", gsub("([^A-Za-z_0-9])", "\\\1", euro), "\s*(\S+).*"), "\1", eurosearch)

见CodingGround demo

请注意，在 gsub("([^A-Za-z_0-9])", "\\\1", euro) 中，我转义了任何非单词符号，以便 $ 可以被视为文字，而不是特殊的正则表达式元字符（取自 this SO post）。

删除 R 中欧元符号后的字符

Removing characters after a EURO symbol in R

regex

r

gsub

stringr