替换字符串中的多个空格,但保留单个空格
Replace multiple spaces in string, but leave singles spaces be
我正在使用 R 阅读 PDF 文件。我想以这种方式转换给定的文本,每当检测到多个 spaces 时,我想用某个值替换它们(例如“_”)。我遇到过可以使用“\\s+”(Merge Multiple spaces to single space; remove trailing/leading spaces)替换 1 或更多的所有 space 的问题,但这对我不起作用。我有一个看起来像这样的字符串;
"[1]This is the first address This is the second one
[2]This is the third one
[3]This is the fourth one This is the fifth"
当我应用找到的答案时;用单个 space 替换 1 或更多的所有 space,我将无法再识别单独的地址,因为它看起来像这样;
gsub("\s+", " ", str_trim(PDF))
"[1]This is the first address This is the second one
[2]This is the third one
[3]This is the fourth one This is the fifth"
所以我要找的是这样的东西
"[1]This is the first address_This is the second one
[2]This is the third one_
[3]This is the fourth one_This is the fifth"
但是,如果我重写示例中使用的代码,我会得到以下内容
gsub("\s+", "_", str_trim(PDF))
"[1]This_is_the_first_address_This_is_the_second_one
[2]This_is_the_third_one_
[3]This_is_the_fourth_one_This_is_the_fifth"
有人知道解决这个问题的方法吗?任何帮助将不胜感激。
每当我遇到字符串和正则表达式问题时,我喜欢参考 stringr
作弊 sheet:https://raw.githubusercontent.com/rstudio/cheatsheets/master/strings.pdf
在第二页你可以看到标题为“量词”的部分,它告诉我们如何解决这个问题:
library(tidyverse)
s <- "This is the first address This is the second one"
str_replace(s, "\s{2,}", "_")
(由于习惯的影响,我正在加载完整的 tidyverse
而不是 stringr
)。
_
.
不会替换任何 2 个或更多空白字符
我正在使用 R 阅读 PDF 文件。我想以这种方式转换给定的文本,每当检测到多个 spaces 时,我想用某个值替换它们(例如“_”)。我遇到过可以使用“\\s+”(Merge Multiple spaces to single space; remove trailing/leading spaces)替换 1 或更多的所有 space 的问题,但这对我不起作用。我有一个看起来像这样的字符串;
"[1]This is the first address This is the second one
[2]This is the third one
[3]This is the fourth one This is the fifth"
当我应用找到的答案时;用单个 space 替换 1 或更多的所有 space,我将无法再识别单独的地址,因为它看起来像这样;
gsub("\s+", " ", str_trim(PDF))
"[1]This is the first address This is the second one
[2]This is the third one
[3]This is the fourth one This is the fifth"
所以我要找的是这样的东西
"[1]This is the first address_This is the second one
[2]This is the third one_
[3]This is the fourth one_This is the fifth"
但是,如果我重写示例中使用的代码,我会得到以下内容
gsub("\s+", "_", str_trim(PDF))
"[1]This_is_the_first_address_This_is_the_second_one
[2]This_is_the_third_one_
[3]This_is_the_fourth_one_This_is_the_fifth"
有人知道解决这个问题的方法吗?任何帮助将不胜感激。
每当我遇到字符串和正则表达式问题时,我喜欢参考 stringr
作弊 sheet:https://raw.githubusercontent.com/rstudio/cheatsheets/master/strings.pdf
在第二页你可以看到标题为“量词”的部分,它告诉我们如何解决这个问题:
library(tidyverse)
s <- "This is the first address This is the second one"
str_replace(s, "\s{2,}", "_")
(由于习惯的影响,我正在加载完整的 tidyverse
而不是 stringr
)。
_
.