子集行仅包含 R 中的字母

Question

我的向量有大约 3000 个观察结果，例如：

clients <- c("Greg Smith", "John Coolman", "Mr. Brown", "John Nightsmith (father)", "2 Nicolas Cage")

如何对仅包含带字母名称的行进行子集化。例如，只有 Greg Smith、John Coolman（没有 0-9、.?:[} 等符号）。

Answer 1

我们可以使用 grep 仅匹配大写或小写字母以及 space 从字符串的开头 (^) 到结尾 ($)。

grep('^[A-Za-z ]+$', clients, value = TRUE)
#[1] "Greg Smith"   "John Coolman"

或者只使用 [[:alpha:] ]+

grep('^[[:alpha:] ]+$', clients, value = TRUE)
#[1] "Greg Smith"   "John Coolman"

Subset rows only contain letters in R