在 R txt 文件中查找和替换数字

Find and replace numbers in an R txt file

我试图在 中的文本文件中找到所有包含任何格式数字的句子,并将其替换为周围的主题标签。

例如采用以下输入:

ex <- c("I have .78 in my account","Hello my name is blank","do you want 1,785 puppies?", 
        "I love stack overflow!","My favorite numbers are 3, 14,568, and 78")

作为函数的输出,我正在寻找:

 > "I have #.78# in my account" 
 > "do you want #1,785# puppies?"
 > "My favorite numbers are #3#, #14,568#, and #78#"

周围的数字是直截了当的,假设所有带有数字、句点、逗号和美元符号的东西都包括在内。

gsub("\b([-[=10=]-9.,]+)\b", "#\1#", ex)
# [1] "I have $#5.78# in my account"                   
# [2] "Hello my name is blank"                         
# [3] "do you want #1,785# puppies?"                   
# [4] "I love stack overflow!"                         
# [5] "My favorite numbers are #3#, #14,568#, and #78#"

要仅过滤掉编号的条目:

grep("\d", gsub("\b([-[=11=]-9.,]+)\b", "#\1#", ex), value = TRUE)
# [1] "I have $#5.78# in my account"                   
# [2] "do you want #1,785# puppies?"                   
# [3] "My favorite numbers are #3#, #14,568#, and #78#"

我们可以使用gsub

gsub("(?<=\s)(?=[[=10=]-9])|(?<=[0-9])(?=,?[ ]|$)", "#", ex, perl = TRUE)
#[1] "I have #.78# in my account"                   "Hello my name is blank"                        
#[3] "do you want #1,785# puppies?"                   "I love stack overflow!"                        
#[5] "My favorite numbers are #3#, #14,568#, and #78#"

另一种循序渐进的方法是使用 grep 识别包含模式 "[0-9]" 的文本文件元素,使用 ex[....] 对带有数字条目的文本元素进行子集化,以及使用 library(dplyr) 中的管道运算符 %>% 将子集传递给 gsub,然后使用 @r2evans 的逻辑在数字条目周围放置主题标签,如下所示:

library(dplyr) 
ex[do.call(grep,list("[0-9]",ex))] %>% gsub("\b([-[=10=]-9.,]+)\b", "#\1#",.)

The do.call(grep,list("[0-9]",ex)) portion of the code returns the indices for the text elements in ex with numeric entries.

Output

library(dplyr) 
ex[do.call(grep,list("[0-9]",ex))] %>% gsub("\b([-[=11=]-9.,]+)\b", "#\1#",.)

[1] "I have $#5.78# in my account"   "do you want #1,785# puppies?"                   
[3] "My favorite numbers are #3#, #14,568#, and #78#"