我如何 match/remove 在评论开头的数字 R
How do I match/remove numbers at the beginning of comments in R
我有一个导入到 R 中的评论列表。下面是一些评论如何导入的示例 -
9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5
注意我将评论保存到一个名为 some_str
的变量中
我的目标是打印出行首没有数字的每一行。像这样 -
This is some string number 1
This is some string number 2
This is some string number 3
This is some string number 4
This is some string number 5
我已经使用下面的代码来处理上面的第一行(9. This is some string number 1
)-
pattern = "([0-9][.][ ])"
str_replace(some_str, pattern, "")
输出This is some string number 1
但是我在 matching/deleting 其他行时遇到困难。例如,如果我创建模式 ([0-9][A-Z])
以匹配第二行的“9T”,我如何只删除数字 9.
最后还要注意,我正在尝试删除仅在评论开头的数字。例如,如果第 3 行有以下注释 -
"9 This is some string number 2. 2 dogs came to town"
我只想去掉评论开头的9。我不想在句号后删除 2。
stringr::str_extract("9. This is some string number 1 2. 2 dogs came to town", "^([0-9][.][ ])")
这应该有效。
只需将您的模式更改为:
^([0-9][.][ ])
我们可以使用sub
sub("^[-0-9. ]+", "", v1)
#[1] "This is some string number 1" "This is some string number 2" "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"
数据
v1 <- c("9. This is some string number 1", "9This is some string number 2",
"9 This is some string number 3", "9-This is some string number 4",
"67-68 This is some string number 5")
这是一个基本的 R 解决方案。
使用的模式是
pattern <- "^[-[:digit:][:punct:][:space:]]*"
它适用于所有发布的测试用例。
sub(pattern, "", x)
#[1] "This is some string number 1" "This is some string number 2"
#[3] "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"
相同的正则表达式适用于最后一个字符串:
sub(pattern, "", y)
#[1] "This is some string number 2. 2 dogs came to town"
包 stringr
的解决方案可能是
library(stringr)
str_remove(x, pattern)
str_remove(y, pattern)
数据
x <- scan(what = character(), text = "
9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5
", sep = "\n")
y <- "9 This is some string number 2. 2 dogs came to town"
另一个解决方案:
library(tidyverse)
dat <- data.frame(x = c("67,68 This is my test",
"67-68 This is my test",
"8 This is my test"))
dat %>%
mutate(x2 = str_replace(x, pattern = "^[^A-Z]*", ""))
给出:
x x2
1 67,68 This is my test This is my test
2 67-68 This is my test This is my test
3 8 This is my test This is my test
我有一个导入到 R 中的评论列表。下面是一些评论如何导入的示例 -
9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5
注意我将评论保存到一个名为 some_str
我的目标是打印出行首没有数字的每一行。像这样 -
This is some string number 1
This is some string number 2
This is some string number 3
This is some string number 4
This is some string number 5
我已经使用下面的代码来处理上面的第一行(9. This is some string number 1
)-
pattern = "([0-9][.][ ])"
str_replace(some_str, pattern, "")
输出This is some string number 1
但是我在 matching/deleting 其他行时遇到困难。例如,如果我创建模式 ([0-9][A-Z])
以匹配第二行的“9T”,我如何只删除数字 9.
最后还要注意,我正在尝试删除仅在评论开头的数字。例如,如果第 3 行有以下注释 -
"9 This is some string number 2. 2 dogs came to town"
我只想去掉评论开头的9。我不想在句号后删除 2。
stringr::str_extract("9. This is some string number 1 2. 2 dogs came to town", "^([0-9][.][ ])")
这应该有效。
只需将您的模式更改为:
^([0-9][.][ ])
我们可以使用sub
sub("^[-0-9. ]+", "", v1)
#[1] "This is some string number 1" "This is some string number 2" "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"
数据
v1 <- c("9. This is some string number 1", "9This is some string number 2",
"9 This is some string number 3", "9-This is some string number 4",
"67-68 This is some string number 5")
这是一个基本的 R 解决方案。
使用的模式是
pattern <- "^[-[:digit:][:punct:][:space:]]*"
它适用于所有发布的测试用例。
sub(pattern, "", x)
#[1] "This is some string number 1" "This is some string number 2"
#[3] "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"
相同的正则表达式适用于最后一个字符串:
sub(pattern, "", y)
#[1] "This is some string number 2. 2 dogs came to town"
包 stringr
的解决方案可能是
library(stringr)
str_remove(x, pattern)
str_remove(y, pattern)
数据
x <- scan(what = character(), text = "
9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5
", sep = "\n")
y <- "9 This is some string number 2. 2 dogs came to town"
另一个解决方案:
library(tidyverse)
dat <- data.frame(x = c("67,68 This is my test",
"67-68 This is my test",
"8 This is my test"))
dat %>%
mutate(x2 = str_replace(x, pattern = "^[^A-Z]*", ""))
给出:
x x2
1 67,68 This is my test This is my test
2 67-68 This is my test This is my test
3 8 This is my test This is my test