R 中的正则表达式和 SharePoint 名称

Question

我正在尝试从 SharePoint 生成的列表中提取姓名。

列表中的每一项都至少包含一个名称和一个长度不一的数字 ID。

列表的格式如下：

all_projects %>% 
  select(contact_names)

 A tibble: 116 x 1
                                                contact_names
                                                       <chr>
 1 last_name, first_name;#6903;#last_name, first_name;#36606
 2                               last_name, first_name;#8585
 3                                                       ...
 4                              last_name, first_name;#14801

使用 stringr 我已经设法通过以下方式得到数字：

str_replace_all(string, pattern = ";#?\d*", ";")

但结果是：

\"last_name, first_name;;last_name, first_name;\",

如果不是双倍 ;;，那还可以。插入一个 ("") 空白字符串 str_replace_all(string, pattern = ";#?\d*", "") returns:

\"last_name, first_namelast_name, first_name;\",

理想情况下，我想将名字和姓氏分成两列。

非常感谢任何帮助。

Answer 1

我们可以使用 separate/separate_rows

library(tidyverse)
separate_rows(df1, contact_names, sep = ";") %>%
        filter(!grepl("#\d+", contact_names)) %>% 
        mutate(contact_names = str_replace_all(contact_names, "#", "")) %>%
        separate(contact_names, into = c("last", "first"), sep=",", remove = FALSE)
# A tibble: 4 x 3
#          contact_names      last       first
#*                 <chr>     <chr>       <chr>
#1 last_name, first_name last_name  first_name
#2 last_name, first_name last_name  first_name
#3 last_name, first_name last_name  first_name
#4 last_name, first_name last_name  first_name

数据

df1 <- tribble(
        ~contact_names,   
                     "last_name, first_name;#6903;#last_name, first_name;#36606",
                            "last_name, first_name;#8585", 
                           "last_name, first_name;#14801")

R 中的正则表达式和 SharePoint 名称

Regex and SharePoint names in R

regex

sharepoint

r

stringr

数据