快速替换 data.table 中的字符串

Fast replacement of string in data.table

给出以下 table

df <- structure(list(V1 = c("Prodigal_2|LOCUS_00010", "Prodigal_2|LOCUS_00010", 
"Prodigal_2|LOCUS_00010", "Prodigal_2|LOCUS_00010", "Prodigal_2|LOCUS_00010", 
"Prodigal_2|LOCUS_00010"), V2 = c("WP_001212884.1", "WP_042596810.1", 
"WP_131250681.1", "WP_001212880.1", "WP_016079538.1", "WP_086396124.1"
), V3 = c(100, 99.7, 99.7, 99.7, 99.7, 99.7), V4 = c(381L, 381L, 
381L, 381L, 381L, 381L), V5 = c(0L, 1L, 1L, 1L, 1L, 1L), V6 = c(0L, 
0L, 0L, 0L, 0L, 0L), V7 = c(1L, 1L, 1L, 1L, 1L, 1L), V8 = c(381L, 
381L, 381L, 381L, 381L, 381L), V9 = c(1L, 1L, 1L, 1L, 1L, 1L), 
    V10 = c(381L, 381L, 381L, 381L, 381L, 381L), V11 = c(1.3e-206, 
    1.7e-206, 1.7e-206, 3e-206, 3e-206, 3e-206), V12 = c(728, 
    727.6, 727.6, 726.9, 726.9, 726.9)), row.names = c(NA, -6L
), class = c("data.table", "data.frame"))

看起来像这样

                       V1             V2    V3  V4 V5 V6 V7  V8 V9 V10      V11 V12
1: Prodigal_2|LOCUS_00010 WP_001212884.1 100.0 381  0  0  1 381  1 381 1.3e-206 728
2: Prodigal_2|LOCUS_00010 WP_042596810.1  99.7 381  1  0  1 381  1 381 1.7e-206 728
3: Prodigal_2|LOCUS_00010 WP_131250681.1  99.7 381  1  0  1 381  1 381 1.7e-206 728
4: Prodigal_2|LOCUS_00010 WP_001212880.1  99.7 381  1  0  1 381  1 381 3.0e-206 727
5: Prodigal_2|LOCUS_00010 WP_016079538.1  99.7 381  1  0  1 381  1 381 3.0e-206 727
6: Prodigal_2|LOCUS_00010 WP_086396124.1  99.7 381  1  0  1 381  1 381 3.0e-206 727

我想将 V1 列中的所有字符串 |LOCUS_XXXXX 替换为空字符串,如下所示。

          V1             V2    V3  V4 V5 V6 V7  V8 V9 V10      V11 V12
1 Prodigal_2 WP_001212884.1 100.0 381  0  0  1 381  1 381 1.3e-206 728
2 Prodigal_2 WP_042596810.1  99.7 381  1  0  1 381  1 381 1.7e-206 728
3 Prodigal_2 WP_131250681.1  99.7 381  1  0  1 381  1 381 1.7e-206 728
4 Prodigal_2 WP_001212880.1  99.7 381  1  0  1 381  1 381 3.0e-206 727
5 Prodigal_2 WP_016079538.1  99.7 381  1  0  1 381  1 381 3.0e-206 727
6 Prodigal_2 WP_086396124.1  99.7 381  1  0  1 381  1 381 3.0e-206 727

我试过以下方法:

Lookup <- c("\|LOCUS_[0-9]+")
Rename <- ""

library(stringi)

setDT(df)[, Result := Rename[stri_detect_regex(V1, Lookup)], by = V1])

RESULT 列为空。理想情况下,我想就地进行替换,这意味着在 V1 列中。数据table 很大,有 220 万行。

我们需要 str_replace 而不是 str_detect

library(stringr)
library(data.table)
setDT(df)[, V1 := str_replace(V1, "\|LOCUS_[0-9]+", "")]