解析 'Table of Contents' 以获得正确的页码

Question

这里是table的内容：

df <- tibble(ToC=
             c("3.1 texta.............. 22",
             "3.2 textb     25",
             "section 6 ................. 50",
             "section 10.2       65"))

我想提取内容和它们各自的页码作为两个变量。我尝试了以下方法，但它无法正常工作。

library(tidyverse); library(stringr)
df_toc <- df %>%
  mutate(page = as.numeric(str_extract(ToC, "[0-9]+")))

正确的页码应该是 22、25、50 和 65。我该如何解决这个问题？

Answer 1

试试这个（行尾的数字）：

df %>% 
  mutate(page = as.numeric(str_extract(ToC, "\d+$")))

解析 'Table of Contents' 以获得正确的页码

Parsing 'Table of Contents' to get the correct page numbers

r

stringr

stringi

tidyverse