如果存在,如何截断字符串的特定部分

How to truncate specific part of string if present

让我们考虑以下向量:

x <- c("GDP_UK", "GDP_US", "GDP_UK_diff2_L2", 
       "INC","GDP_UK_L2", "GDP_US_level", "INC_UK", "INC_L1", "INC_diff1")

如您所见,有一个向量包含一些字符串。

我想做的是找出其中有"_diff(number)", "_L(number), _level的那些,截断这部分字符串。

我想要得到的是一个向量,如下所示:

c("GDP_UK", "GDP_US", "GDP_UK", "INC", "GDUP_UK", "GDP_US", "INC_UK", "INC", "INC")

如您所见,所有 _diff, _L, _level 都被截断以获得原始字符串。

而且我不确定该怎么做。我试过代码

x[grepl(paste(c("diff", "level", "_L"), collapse = "|"), x)]

只获取包含grepllevel_L的元素,但我不知道如何切割它。用 substring 尝试了一些东西,但不确定如何指定最多应该删除哪个字母。你知道怎么做吗?

** 编辑 **

我们可以使用以下代码:

x <- gsub(pattern = "_L", replacement = "", x)
x <- gsub(pattern = "_diff", replacement = "", x)
x <- gsub(pattern = "_level", replacement = "", x)

然而,我们最终会在字符串的末尾得到剩余的数字:

 "GDP_UK"   "GDP_US"   "GDP_UK22" "INC"      "GDP_UK2"  "GDP_US"   "INC_UK"   "INC2"     "INC1"  

您要查找的是正则表达式 "_L\d*" 等。它匹配下划线、L 和零个或多个数字。

完整


x <- c("GDP_UK", "GDP_US", "GDP_UK_diff2_L2", 
       "INC","GDP_UK_L2", "GDP_US_level", "INC_UK", "INC_L1", "INC_diff1")

gsub("_L\d*", "", x)
gsub("_diff\d*", "", x)
gsub("_level\d*", "", x)


# or in one go:
library(stringr)
x %>% 
  str_replace_all("_L\d*", "") %>% 
  str_replace_all("_diff\d*", "") %>% 
  str_replace_all("_level\d*", "")
#> [1] "GDP_UK" "GDP_US" "GDP_UK" "INC"    "GDP_UK" "GDP_US" "INC_UK" "INC"   
#> [9] "INC"

## or even in one go:
gsub("_(L|diff|level)\d*", "", x)
#> [1] "GDP_UK" "GDP_US" "GDP_UK" "INC"    "GDP_UK" "GDP_US" "INC_UK" "INC"   
#> [9] "INC"