删除R中字符串中最后两个单词之间的间隙

Removing gap between last two words in a string in R

我正在尝试消除包含多个字符串的数据框中最后两个单词之间的空隙。我试过使用 gsub 但我尝试使用 gsub("(\s){1}$","",df1$V1) 似乎完全错误! df1 是我的数据集,df2 是我想要的结果。

df1 <- data.frame(V1=c("Apple Pear Orange, AAA 111", "Grapes Banana Pear . BBB 222", "Orange Kiwi Melon , CCC 333", "Apple DDD 444", "Kiwi Melon Orange CCC 333", "Apple Pear Orange, AAA 111", "Tomato Cucumber EEE 222", "Seagull Pigeon ZZZ 111" ), stringsAsFactors = F)

df2 <- data.frame(V1=c("Apple Pear Orange, AAA111", "Grapes Banana Pear . BBB222", "Orange Kiwi Melon , CCC333", "Apple DDD444", "Kiwi Melon Orange CCC333", "Apple Pear Orange, AAA111", "Tomato Cucumber EEE222", "Seagull Pigeon ZZZ111" ), stringsAsFactors = F)

您可以使用捕获组:

sub("(.*)\s+([^\s]+)$", "\1\2", df1$V1)
#[1] "Apple Pear Orange, AAA111"   "Grapes Banana Pear . BBB222" "Orange Kiwi Melon , CCC333"  "Apple DDD444"               
#[5] "Kiwi Melon Orange CCC333"    "Apple Pear Orange, AAA111"   "Tomato Cucumber EEE222"      "Seagull Pigeon ZZZ111" 

这会捕获第一组任意数量的字符,然后是 1+ 个空格,作为第二组捕获 1+ 个字符,直到字符串末尾都不是空格。然后它只提取两个捕获组,中间没有空格。

甚至这样:

gsub("(.*)\s","\1",df1$V1)

根据 Docendo 的回答,您可以使用 \w+ 来匹配任意长度的单词:

gsub("(\w+)\s+(\w+$)", "\1\2" ,df1$V1)

#[1] "Apple Pear Orange, AAA111"   "Grapes Banana Pear . BBB222" "Orange Kiwi Melon , CCC333" 
#[4] "Apple DDD444"                "Kiwi Melon Orange CCC333"    "Apple Pear Orange, AAA111"  
#[7] "Tomato Cucumber EEE222"      "Seagull Pigeon ZZZ111"

然后您可以对捕获组使用相同的想法。