使用 str_detect() 从列中提取信息，然后创建一个新列

Question

我正在使用 data.frame，其中包含一列，其值的命名方式如下：D1_open、D9_shurb、D10_open 等

我想创建一个新列，其值只是“open”或“shurb”。也就是说，我想从“ID_SubPlot”中提取“open”和“shrub”这两个词，并将它们放在一个新的列中。我相信 str_detect() 会很有用，但我不知道怎么用。

示例数据：

test <- structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746, 
618, 910, 898), ID_SubPlot = c("D1_open", "D9_shrub", "D8_open", 
"E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub", 
"S9_shrub", "S9_shrub")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 1

正则表达式（另见 regex cheatsheet for R）

只需使用 ".*_(.*)" 捕获第一组中 _ 之后的所有内容，并用第一个捕获的组替换每个字符串。

test$col = gsub(".*_(.*)", "\1", test$ID_SubPlot)

test
   ID_Plant ID_SubPlot   col
1       243    D1_open  open
2       370   D9_shrub shrub
3       789    D8_open  open
4       143   E4_shrub shrub
5       559   U5_shrub shrub
6       588   U10_open  open
7       746  S10_shrub shrub
8       618  U10_shrub shrub
9       910   S9_shrub shrub
10      898   S9_shrub shrub

数据

test=structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746, 618, 910, 898), 
ID_SubPlot = c("D1_open", "D9_shrub", "D8_open", "E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub", "S9_shrub", "S9_shrub")), 
row.names = c(NA, -10L), class = c("data.frame"))

Answer 2

这是一种使用 tidyr 中的 separate 的方法：

library(tidyr)

separate(test, ID_SubPlot, into = c("Code", "NewCol"), sep = "_")

输出

   ID_Plant Code NewCol
1       243   D1   open
2       370   D9  shrub
3       789   D8   open
4       143   E4  shrub
5       559   U5  shrub
6       588  U10   open
7       746  S10  shrub
8       618  U10  shrub
9       910   S9  shrub
10      898   S9  shrub

Answer 3

这也可以帮助你。我假设您想删除 ID 部分加上下划线：

library(dplyr)
library(stringr)

test %>%
  mutate(result = str_remove(ID_SubPlot, "^[A-Za-z]\d+(_)"))

# A tibble: 10 x 3
   ID_Plant ID_SubPlot result
      <dbl> <chr>      <chr> 
 1      243 D1_open    open  
 2      370 D9_shrub   shrub 
 3      789 D8_open    open  
 4      143 E4_shrub   shrub 
 5      559 U5_shrub   shrub 
 6      588 U10_open   open  
 7      746 S10_shrub  shrub 
 8      618 U10_shrub  shrub 
 9      910 S9_shrub   shrub 
10      898 S9_shrub   shrub

使用 str_detect() 从列中提取信息，然后创建一个新列

Use str_detect() to extract information from a column and then create a new column

r

dataframe

stringr

tidyverse

正则表达式（另见 regex cheatsheet for R）

数据