如果以子字符串结尾,则将单元格值设置为 NA

Set to cell value to NA if ends with substring

我有以下数据框:

df = data.frame(column1=c("abc", "def", "ghi"), column2=c("jki", "lmn", "opq"), column3=c("A-", "B-C", NA))

如果单元格以 - 结尾,我想将 column3 的单元格值设置为 NA

我只成功地对数据帧进行了子集化,这不是我想要的:

subset(df, !grepl("*-$", column3))

这是我的预期输出:

你可以试试:

df$column3 = ifelse(grepl("*-$", df$column3), NA, df$column3)

输出:

> df
  column1 column2 column3
1     abc     jki    <NA>
2     def     lmn     B-C
3     ghi     opq    <NA>

备选方案 1:dplyr

library(dplyr)

df %>%
  mutate(column3 = ifelse(grepl("*-$", column3), NA, column3))

  column1 column2 column3
1     abc     jki    <NA>
2     def     lmn     B-C
3     ghi     opq    <NA>

选项 2:data.table

library(data.table)

setDT(df)
df[grepl("*-$", column3), column3:=NA]

df

   column1 column2 column3
1:     abc     jki    <NA>
2:     def     lmn     B-C
3:     ghi     opq    <NA>

我们可以使用replace + endsWith

> transform(
+   df,
+   column3 = replace(
+     column3,
+     endsWith(column3,"-"),
+     NA
+   )
+ )
  column1 column2 column3
1     abc     jki    <NA>
2     def     lmn     B-C
3     ghi     opq    <NA>