找到字符串中第一个数字的位置 [R]

Question

如何在 R 中创建一个函数来定位字符串中第一个数字的单词位置？

例如：

string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9

string2 <- "80111 is in this string"
#desired_output for string2
1

string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5

Answer 1

这里是 return 您想要的输出的方法：

library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string, " ", simplify = TRUE))))))

这是它的工作原理：

str_split(string, " ", simplify = TRUE) # converts your string to a vector/matrix, splitting at space

as.numeric(...) # tries to convert each element to a number, returning NA when it fails

suppressWarnings(...) # suppresses the warnings generated by as.numeric

!is.na(...) # returns true for the values that are not NA (i.e. the numbers)

which(...) # returns the position for each TRUE values

min(...) # returns the first position

输出：

min(which(!is.na(suppressWarnings(as.numeric(str_split(string1, " ", simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2, " ", simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3, " ", simplify = TRUE))))))
[1] 5

Answer 2

尝试以下操作：

library(stringr)

position_first_number <- function(string) {
  min(which(str_detect(str_split(string, "\s+", simplify = TRUE), "[0-9]+")))
}

使用您的示例字符串：

> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
 
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
 
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5

Answer 3

我只想在这里使用 grep 和 strsplit 作为基础 R 选项：

sapply(input, function(x) grep("\d+", strsplit(x, " ")[[1]]))

Hello I'd like to extract where the first 1010 is in this string
                                                               9
                                         80111 is in this string
                                                               1
                 extract where the first 97865 is in this string
                                                               5

数据：

input <- c("Hello I'd like to extract where the first 1010 is in this string",
           "80111 is in this string",
           "extract where the first 97865 is in this string")

Answer 4

这是一个基本解决方案，使用 rapply() w/ grep() 递归 strsplit() 的结果并使用字符串向量。

注意：如果您想在任何白色 space 上拆分字符串，请将 " " 和 fixed = TRUE 替换为 "\s+" 和 fixed = FALSE（默认值）而不是文字 space.

rapply(strsplit(strings, " ", fixed = TRUE), function(x) grep("[0-9]+", x))
[1] 9 1 5

数据:

strings = c("Hello I'd like to extract where the first 1010 is in this string", 
            "80111 is in this string", "extract where the first 97865 is in this string")

Answer 5

这是另一种方法。我们可以 trim 关闭第一个数字的第一个数字之后的剩余字符。然后，找到最后一个词的位置。 \b 匹配单词边界，而 \S+ 匹配一个或多个 non-whitespace 个字符。

first_numeric_word <- function(x) {
  x <- substr(x, 1L, regexpr("\b\d+\b", x))
  lengths(gregexpr("\b\S+\b", x))
}

输出

> first_numeric_word(x)
[1] 9 1 5

数据

x <- c(
  "Hello I'd like to extract where  the first 1010 is in this string", 
  "80111 is in this string", 
  "extract where the   first  97865 is in this string"
)

Answer 6

这里我将留下一个完整的tidyverse方法：

library(purrr)
library(stringr)

map_dbl(str_split(strings, " "), str_which, "\d+")
#> [1] 9 1 5

map_dbl(str_split(strings[1], " "), str_which, "\d+")
#> [1] 9

请注意，它适用于一个和多个字符串。

其中 strings 是：

strings <- c("Hello I'd like to extract where the first 1010 is in this string",
             "80111 is in this string",
             "extract where the first 97865 is in this string")

找到字符串中第一个数字的位置 [R]

Locate position of first number in string [R]

string

r

locate