解释R包```stringr```中```str_match_all```的行为

Explain the behavior of ```str_match_all``` in R package ```stringr```

st = list("amber johnson", "anhar link ari")
t = stringr::str_match_all(st, "(\ba[a-z]+\b)")
str(t)
# List of 2
#  $ : chr [1, 1:2] "amber" "amber"
#  $ : chr [1:2, 1:2] "anhar" "ari" "anhar" "ari"

为什么结果会这样重复?

如果您查看 ?str_match_all 值,它会显示:

For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. For str_match_all, a list of character matrices.

由于您的模式包含一个捕获组,因此结果包含两列,一列用于完全匹配,一列用于捕获组。如果你不想要重复的列,你可以从模式中删除组括号:

st = list("amber johnson", "anhar link ari")
t = str_match_all(st, "\ba[a-z]+\b")
str(t)

给出:

# List of 2
#  $ : chr [1, 1] "amber"
#  $ : chr [1:2, 1] "anhar" "ari"