删除逗号和/或句点,除非某些条件适用于 R 中的最后一次出现

Remove comma and or period except if certain condition holds for last occurrence in R

我想从字符串中删除所有逗号和句点,除非字符串以逗号(或句点)结尾,后跟一个或两个数字。

一些例子是:

12.345.67 #would become 12345.67
12.345,67 #would become 12345,67
12.345,6  #would become 12345,6
12.345.6  #would become 12345.6
12.345    #would become 12345
1,2.345   #would become 12345

等等

一种解决方案是统计最后一个comma/period之后的字符(nchar(word(x, -1, sep = ',|\.'))),如果长度大于2,则去掉所有分隔符(gsub(',|\.', '', x)),否则就第一个 (sub(',|\.', '', x).

library(stringr)
ifelse(nchar(word(x, -1, sep = ',|\.')) > 2, gsub(',|\.', '', x), sub(',|\.', '', x))

#[1] "12345.67" "12345,67" "12345,6"  "12234"    "1234"     "12.45"  

数据

x <- c("12.345.67", "12.345,67", "12.345,6", "1,2.234", "1.234", "1,2.45")

使用与@Sotos 相同的数据的 stringi 解决方案将是:

library(stringi)
  • 第 1 行删除最后一个 ,. 字符,如果后跟超过 2 个字符

  • 第 2 行删除第一个 ,. 个字符,如果还有超过 1 个 ,. 离开

x<-ifelse(stri_locate_last_regex(x,"([,.])")[,2]<(stri_length(x)-2), stri_replace_last_regex(x,"([,.])",""),x)

x <- if(stri_count_regex(x,"([,.])") > 1){stri_replace_first_regex(x,"([,.])","")}
> x
[1] "12345.67" "12345,67" "12345,6"  "12234"    "1234"     "12.45" 

另一种选择是使用否定前瞻语法 ?!perl compatible 正则表达式:

df
#          V1
# 1 12.345.67
# 2 12.345,67
# 3  12.345,6
# 4  12.345.6
# 5    12.345
# 6   1,2.345

df$V1 = gsub("[,.](?!\d{1,2}$)", "", df$V1, perl = T)
df          # remove , or . except they are followed by 1 or 2 digits at the end of string
#         V1
# 1 12345.67
# 2 12345,67
# 3  12345,6
# 4  12345.6
# 5    12345
# 6    12345