在由斜杠分隔的数字字符串中删除数字的位置 n 处的一个数字

Question

我有一个字符列是这样配置的：

data <- data.frame(
  id = 1:3,
  codes = c("08001301001", "08002401002 / 08002601003 / 17134604034", "08004701005 / 08005101001"))

我想删除字符串中任意数字的第 6 位。数字的长度始终为 10 个字符。

我的代码有效。但是我相信使用 RegEx 可能会更容易，但我无法弄清楚。

library(stringr)

remove_6_digit <- function(x){
  idxs <- str_locate_all(x,"/")[[1]][,1]
  
  for (idx in c(rev(idxs+7), 6)){
      str_sub(x, idx, idx) <- ""      
  }
  return(x)
}

result <- sapply(data$codes, remove_6_digit, USE.NAMES = F)

Answer 1

你可以使用

gsub("\b(\d{5})\d", "\1", data$codes)

见regex demo。这将从数字序列的开头删除第 6 位。

详情:

\b - 单词边界
(\d{5}) - 捕获组 1 (</code>)：五位数</li> <li><code>\d - 一个数字。

虽然单词边界对于当前情况看起来足够了，但数字边界也是一种选择，以防数字粘在单词字符上：

gsub("(?<!\d)(\d{5})\d", "\1", data$codes, perl=TRUE)

其中 perl=TRUE 启用 PCRE 正则表达式语法，而 (?<!\d) 是一个负向回顾，如果当前位置的左侧紧邻有一个数字，则匹配失败。

如果你必须只更改 numeric 字符序列的 10 位数字（不能更短也不能再）你可以使用

gsub("\b(\d{5})\d(\d{4})\b", "\1\2", data$codes)
gsub("(?<!\d)(\d{5})\d(?=\d{4}(?!\d))", "\1", data$codes, perl=TRUE)

请注意：您的号码由 11 位数字组成，因此您需要将 \d{4} 替换为 \d{5}，请参阅 this regex demo。

Answer 2

另一种可能的解决方案，使用 stringr::str_replace_all 和环顾四周：

library(tidyverse)

data %>% 
  mutate(codes = str_replace_all(codes, "(?<=\d{5})\d(?=\d{5})", ""))

#>   id                                codes
#> 1  1                           0800101001
#> 2  2 0800201002 / 0800201003 / 1713404034
#> 3  3              0800401005 / 0800501001

在由斜杠分隔的数字字符串中删除数字的位置 n 处的一个数字

Remove one number at position n of the number in a string of numbers separated by slashes

r

regex

stringr