使用 R 根据其他列中的匹配项将字符串添加到目标字段
Use R to add strings to a destination field based on a match in other columns
我有一个包含三个填充列(Submitted_Name、状态、Accepted_Name)和一个空列(标志)的数据框。
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
我想根据 Submitted_Name 和 Accepted_Name 字段中特定短语的存在,用某些字符串填充“Flag”。如果“短”。或“pre Herbarium Practice”出现在Submitted_Name,那么我希望“submitted name is horticultural”或“submitted name is pre herbarium practice”出现在“Flag”中。如果语句“var”。或“forma”或“_x”或“comb.ined”出现在 Accepted_Name 字段中,然后是“variety”、“form”、“hybrid”或“accepted name is comb.ined”应添加到“标志”。如果没有触发短语,则“Flag”保持空白。
总结:
来自 Submitted_Name
短。 = 提交的名称是 horticultural
pre Herbarium Practice = 提交的名称是 pre Herbarium practice
来自 Accepted_Name
变种。 = 多样性
forma = 形式
_x = 混合体
comb.ined = 接受的名称是 comb.ined
期望的结果是:
data.post <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('variety; submitted name is horticultural','form','hybrid; submitted name is pre herbarium practice','','accepted name is comb.ined.')
)
对于只需要将一个值添加到“Flag”的实例,我可以使用下面费力的重复代码来管理它(我很好地保留了这种形式):
Master.Taxonomy$Flag <- ifelse(grepl("var.", Master.Taxonomy$Accepted_Name), "variety", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("comb.ined.", Master.Taxonomy$Accepted_Name), "accepted name is comb.ined.", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("_x", Master.Taxonomy$Accepted_Name), "hybrid", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("pre Herbarium Practice", Master.Taxonomy$Submitted_Name), "submitted name is pre herbarium practice", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("hort.", Master.Taxonomy$Submitted_Name), "submitted name is horticultural", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("forma", Master.Taxonomy$Accepted_Name), "form", Master.Taxonomy$Flag)
然而,在要添加两个或更多值的地方,后者会覆盖前者,而我只剩下最后添加到“标志”的任何内容。
我试过弄糊涂,但把自己束缚住了。
注意,短语出现在“Flag”中的顺序并不重要
感谢任何帮助!
您可以 case_when
和 str_detect
为此目的。您可以创建两个不同的列,而不是在同一列中执行所有操作,一个用于提交标志,另一个用于接受标志,最后,您可以使用 unite
将两者结合起来以获得所需的结果。
library(tidyverse)
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
data.pre %>%
mutate(f1 = case_when(Submitted_Name %>% str_detect("hort") ~ "submitted name is horticultural",
Submitted_Name %>% str_detect("pre Herbarium Practice") ~ "submitted name is pre herbarium practice"),
f2 = case_when(Accepted_Name %>% str_detect("var.") ~ "variety",
Accepted_Name %>% str_detect("comb.ined.") ~ "accepted name is comb.ined.",
Accepted_Name %>% str_detect("_x") ~ "hybrid",
Accepted_Name %>% str_detect("forma") ~ "form")) %>%
unite("Flag", c(f2,f1), na.rm = T, sep = "; ")
#> Submitted_Name Status
#> 1 Aa achalensis Schltr. hort. accepted
#> 2 Aa argyrolepis Rchb.f. accepted
#> 3 Aa aurantiaca D.Trujillo pre Herbarium Practice accepted
#> 4 Aa brevis Schltr. synonym
#> 5 Aa calceata (Rchb.f.) Schltr. accepted
#> Accepted_Name
#> 1 Aa achalensis var. alba Schltr.
#> 2 Aa argyrolepis forma beta Rchb.f.
#> 3 Aa aurantiaca_x D.Trujillo
#> 4 Myrosmodes breve (Schltr.) Garay
#> 5 Aa calceata (Rchb.f.) Schltr. comb.ined.
#> Flag
#> 1 variety; submitted name is horticultural
#> 2 form
#> 3 hybrid; submitted name is pre herbarium practice
#> 4
#> 5 accepted name is comb.ined.
由 reprex package (v0.3.0)
于 2021-01-30 创建
我有一个包含三个填充列(Submitted_Name、状态、Accepted_Name)和一个空列(标志)的数据框。
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
我想根据 Submitted_Name 和 Accepted_Name 字段中特定短语的存在,用某些字符串填充“Flag”。如果“短”。或“pre Herbarium Practice”出现在Submitted_Name,那么我希望“submitted name is horticultural”或“submitted name is pre herbarium practice”出现在“Flag”中。如果语句“var”。或“forma”或“_x”或“comb.ined”出现在 Accepted_Name 字段中,然后是“variety”、“form”、“hybrid”或“accepted name is comb.ined”应添加到“标志”。如果没有触发短语,则“Flag”保持空白。
总结:
来自 Submitted_Name
短。 = 提交的名称是 horticultural
pre Herbarium Practice = 提交的名称是 pre Herbarium practice
来自 Accepted_Name
变种。 = 多样性
forma = 形式
_x = 混合体
comb.ined = 接受的名称是 comb.ined
期望的结果是:
data.post <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('variety; submitted name is horticultural','form','hybrid; submitted name is pre herbarium practice','','accepted name is comb.ined.')
)
对于只需要将一个值添加到“Flag”的实例,我可以使用下面费力的重复代码来管理它(我很好地保留了这种形式):
Master.Taxonomy$Flag <- ifelse(grepl("var.", Master.Taxonomy$Accepted_Name), "variety", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("comb.ined.", Master.Taxonomy$Accepted_Name), "accepted name is comb.ined.", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("_x", Master.Taxonomy$Accepted_Name), "hybrid", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("pre Herbarium Practice", Master.Taxonomy$Submitted_Name), "submitted name is pre herbarium practice", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("hort.", Master.Taxonomy$Submitted_Name), "submitted name is horticultural", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("forma", Master.Taxonomy$Accepted_Name), "form", Master.Taxonomy$Flag)
然而,在要添加两个或更多值的地方,后者会覆盖前者,而我只剩下最后添加到“标志”的任何内容。 我试过弄糊涂,但把自己束缚住了。 注意,短语出现在“Flag”中的顺序并不重要 感谢任何帮助!
您可以 case_when
和 str_detect
为此目的。您可以创建两个不同的列,而不是在同一列中执行所有操作,一个用于提交标志,另一个用于接受标志,最后,您可以使用 unite
将两者结合起来以获得所需的结果。
library(tidyverse)
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
data.pre %>%
mutate(f1 = case_when(Submitted_Name %>% str_detect("hort") ~ "submitted name is horticultural",
Submitted_Name %>% str_detect("pre Herbarium Practice") ~ "submitted name is pre herbarium practice"),
f2 = case_when(Accepted_Name %>% str_detect("var.") ~ "variety",
Accepted_Name %>% str_detect("comb.ined.") ~ "accepted name is comb.ined.",
Accepted_Name %>% str_detect("_x") ~ "hybrid",
Accepted_Name %>% str_detect("forma") ~ "form")) %>%
unite("Flag", c(f2,f1), na.rm = T, sep = "; ")
#> Submitted_Name Status
#> 1 Aa achalensis Schltr. hort. accepted
#> 2 Aa argyrolepis Rchb.f. accepted
#> 3 Aa aurantiaca D.Trujillo pre Herbarium Practice accepted
#> 4 Aa brevis Schltr. synonym
#> 5 Aa calceata (Rchb.f.) Schltr. accepted
#> Accepted_Name
#> 1 Aa achalensis var. alba Schltr.
#> 2 Aa argyrolepis forma beta Rchb.f.
#> 3 Aa aurantiaca_x D.Trujillo
#> 4 Myrosmodes breve (Schltr.) Garay
#> 5 Aa calceata (Rchb.f.) Schltr. comb.ined.
#> Flag
#> 1 variety; submitted name is horticultural
#> 2 form
#> 3 hybrid; submitted name is pre herbarium practice
#> 4
#> 5 accepted name is comb.ined.
由 reprex package (v0.3.0)
于 2021-01-30 创建