使用 R 根据其他列中的匹配项将字符串添加到目标字段

Use R to add strings to a destination field based on a match in other columns

我有一个包含三个填充列(Submitted_Name、状态、Accepted_Name)和一个空列(标志)的数据框。

data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)

我想根据 Submitted_Name 和 Accepted_Name 字段中特定短语的存在,用某些字符串填充“Flag”。如果“短”。或“pre Herbarium Practice”出现在Submitted_Name,那么我希望“submitted name is horticultural”或“submitted name is pre herbarium practice”出现在“Flag”中。如果语句“var”。或“forma”或“_x”或“comb.ined”出现在 Accepted_Name 字段中,然后是“variety”、“form”、“hybrid”或“accepted name is comb.ined”应添加到“标志”。如果没有触发短语,则“Flag”保持空白。

总结:

来自 Submitted_Name

短。 = 提交的名称是 horticultural

pre Herbarium Practice = 提交的名称是 pre Herbarium practice

来自 Accepted_Name

变种。 = 多样性

forma = 形式

_x = 混合体

comb.ined = 接受的名称是 comb.ined

期望的结果是:

data.post <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('variety; submitted name is horticultural','form','hybrid; submitted name is pre herbarium practice','','accepted name is comb.ined.')
)

对于只需要将一个值添加到“Flag”的实例,我可以使用下面费力的重复代码来管理它(我很好地保留了这种形式):

Master.Taxonomy$Flag <- ifelse(grepl("var.", Master.Taxonomy$Accepted_Name), "variety", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("comb.ined.", Master.Taxonomy$Accepted_Name), "accepted name is comb.ined.", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("_x", Master.Taxonomy$Accepted_Name), "hybrid", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("pre Herbarium Practice", Master.Taxonomy$Submitted_Name), "submitted name is pre herbarium practice", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("hort.", Master.Taxonomy$Submitted_Name), "submitted name is horticultural", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("forma", Master.Taxonomy$Accepted_Name), "form", Master.Taxonomy$Flag)

然而,在要添加两个或更多值的地方,后者会覆盖前者,而我只剩下最后添加到“标志”的任何内容。 我试过弄糊涂,但把自己束缚住了。 注意,短语出现在“Flag”中的顺序并不重要 感谢任何帮助!

您可以 case_whenstr_detect 为此目的。您可以创建两个不同的列,而不是在同一列中执行所有操作,一个用于提交标志,另一个用于接受标志,最后,您可以使用 unite 将两者结合起来以获得所需的结果。

library(tidyverse)

data.pre <- data.frame(
  'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
  'Status' = c('accepted','accepted','accepted','synonym','accepted'),
  'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
  'Flag' = c('','','','','')
)

data.pre %>% 
  mutate(f1 = case_when(Submitted_Name %>% str_detect("hort") ~ "submitted name is horticultural",
                        Submitted_Name %>% str_detect("pre Herbarium Practice") ~ "submitted name is pre herbarium practice"),
         f2 = case_when(Accepted_Name %>% str_detect("var.") ~ "variety",
                        Accepted_Name %>% str_detect("comb.ined.") ~ "accepted name is comb.ined.",
                        Accepted_Name %>% str_detect("_x") ~ "hybrid",
                        Accepted_Name %>% str_detect("forma") ~ "form")) %>% 
  unite("Flag", c(f2,f1), na.rm = T, sep = "; ")
#>                                    Submitted_Name   Status
#> 1                     Aa achalensis Schltr. hort. accepted
#> 2                          Aa argyrolepis Rchb.f. accepted
#> 3 Aa aurantiaca D.Trujillo pre Herbarium Practice accepted
#> 4                               Aa brevis Schltr.  synonym
#> 5                   Aa calceata (Rchb.f.) Schltr. accepted
#>                              Accepted_Name
#> 1          Aa achalensis var. alba Schltr.
#> 2        Aa argyrolepis forma beta Rchb.f.
#> 3               Aa aurantiaca_x D.Trujillo
#> 4         Myrosmodes breve (Schltr.) Garay
#> 5 Aa calceata (Rchb.f.) Schltr. comb.ined.
#>                                               Flag
#> 1         variety; submitted name is horticultural
#> 2                                             form
#> 3 hybrid; submitted name is pre herbarium practice
#> 4                                                 
#> 5                      accepted name is comb.ined.

reprex package (v0.3.0)

于 2021-01-30 创建