将字符串拆分为子字符串列表

Question

我有一个字符串 id <- "Hello these are words N12345678 hooray how fun".

我只想从此字符串中提取 N12345678。

到目前为止我已经使用了strsplit(id, " ")。现在我有

>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"

哪个是列表类型且长度为 1（尽管显然有 8 个元素？）

如果我再使用 id <- id[grep("^[N][0-9]",id)]， id 是一个空列表。

我想我需要做的是将字符串拆分成一个长度为 8 的列表，每个元素作为一个子字符串，然后 grep 应该能够挑选出模式，但我不确定如何去做那。

Answer 1

使用regmatches

> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"

Answer 2

你知道strtok吗？它会在某些字符上解析您的输入行。就我的示例而言，每次我点击 space.

时，我都会折断一根绳子

tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
     tempVar = strtok(NULL, " ");
     //now tempVar picked up the next word, and will loop picking up the next word until the end of string
}

使用这个，你的 "Hello these are words N123456789 Hooray" 会这样做： tempVar 将是 Hello，然后是 "these" 等等等等

每次循环tempVar都会得到一个新值。所以我建议在循环中评估 tempVar（在获取下一个之前），这样当你有 N123456789

时你可以停止

Answer 3

尝试：

gsub('\b[a-zA-Z]+\b','',id)

Answer 4

如果你坚持使用strsplit。我认为这可以解决问题：

id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]

请注意，我没有更改您的正则表达式。它可以是更精确的表达式，例如 ^N\d+$.

将字符串拆分为子字符串列表

Spliting string into a list of substrings

regex

string

substring

r

strsplit