删除R中长度大于X的单词

delete the words with length greater than X in R

在 R 编程中,我删除标点符号、数字和非 ascii 字符后,我留下了许多长字符的单词:

ques1<-gsub("[[:digit:]]"," ", ques1,perl=TRUE)
ques1<-gsub("[[:punct:]]"," ", ques1,perl=TRUE)

ques1<-iconv(ques1, "latin1", "ASCII", sub=" ")
ques1<-rm_white(ques1)
ques1

我使用

检查了字符的最长长度是 35
max(nchar(strsplit(ques1, " ")[[1]]))
[1] 35

现在,我想删除超过 10 个字符的单词,因为我不想要它们,例如

wwwhotmailcomlearnbyexample

请帮帮我!!!

使用以下 gsub:

ques1 = "A long sentence with long wwwhotmailcomlearnbyexample"
gsub("\b[[:alpha:]]{11,}\b", "", ques1, perl=T)

\b[[:alpha:]]{11,}\b 正则表达式将匹配长度为 11 或更大的单词(\b 是单词边界,[:alpha:] 代表任何字母)。

IDEONE demo