Return 最多前三个字
Return up to the first three words
试图找到 return R 中前三个单词的方法。我尝试了 string_r 中的单词函数,但它只 return 是句子的前三个单词至少有三个词。例如,
sentences <- c("Jane saw a cat", "Jane sat down", "Jane sat", "Jane")
word(sentences, 1, 3)
这个returns Jane saw a
, Jane sat down
, NA
, NA
我想return前三个词,即使句子只有一两个词。所以我正在寻找的输出是:
这个returns Jane saw a
, Jane sat down
, Jane Sat
, Jane
我们可以拆分得到单词
sapply(strsplit(sentences, " "), \(x) paste(head(x, 3), collapse=" "))
-输出
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
或使用正则表达式
trimws( sub("^((\w+\s+){1,3}).*", "\1", sentences))
-输出
[1] "Jane saw a" "Jane sat" "Jane" "Jane"
如果我们要使用word
,那么可能需要一个coalesce
library(stringr)
library(purrr)
library(dplyr)
map(3:1, word, string = sentences, start = 1) %>%
exec(coalesce, !!!.)
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
1) stringr 计算输入的每个组件中的单词数,并使用该值或 3,以较小者为准,作为 return 的单词数.
library(stringr)
word(sentences, end = pmin(str_count(sentences, "\w+"), 3))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
2) stringr 解决方案 2 在末尾附加一些虚拟词,去掉前 3 个词和 trim 剩下的所有虚拟词。
sentences %>%
str_c("@ @ @") %>%
word(end = 3) %>%
str_replace(" *@.*", "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
3a) Base R 与 (1) 相同的想法可以这样翻译成 base R:
Word <- function(x, end) do.call("paste", read.table(text = x, fill = TRUE)[1:end])
unname(Vectorize(Word)(sentences, end = pmin(lengths(strsplit(sentences, " ")), 3)))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
3b) 和(2)一样的思路可以这样翻译成base R。 Word
来自 (3a).
sentences |>
paste("@ @ @") |>
Word(end = 3) |>
sub(pattern = " *@.*", replacement = "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
更新
(1) 已简化,旧的 (1) 现在是 (2)。 (3a) 和 (3b) 现在是 Base R 对应项。
试图找到 return R 中前三个单词的方法。我尝试了 string_r 中的单词函数,但它只 return 是句子的前三个单词至少有三个词。例如,
sentences <- c("Jane saw a cat", "Jane sat down", "Jane sat", "Jane")
word(sentences, 1, 3)
这个returns Jane saw a
, Jane sat down
, NA
, NA
我想return前三个词,即使句子只有一两个词。所以我正在寻找的输出是:
这个returns Jane saw a
, Jane sat down
, Jane Sat
, Jane
我们可以拆分得到单词
sapply(strsplit(sentences, " "), \(x) paste(head(x, 3), collapse=" "))
-输出
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
或使用正则表达式
trimws( sub("^((\w+\s+){1,3}).*", "\1", sentences))
-输出
[1] "Jane saw a" "Jane sat" "Jane" "Jane"
如果我们要使用word
,那么可能需要一个coalesce
library(stringr)
library(purrr)
library(dplyr)
map(3:1, word, string = sentences, start = 1) %>%
exec(coalesce, !!!.)
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
1) stringr 计算输入的每个组件中的单词数,并使用该值或 3,以较小者为准,作为 return 的单词数.
library(stringr)
word(sentences, end = pmin(str_count(sentences, "\w+"), 3))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
2) stringr 解决方案 2 在末尾附加一些虚拟词,去掉前 3 个词和 trim 剩下的所有虚拟词。
sentences %>%
str_c("@ @ @") %>%
word(end = 3) %>%
str_replace(" *@.*", "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
3a) Base R 与 (1) 相同的想法可以这样翻译成 base R:
Word <- function(x, end) do.call("paste", read.table(text = x, fill = TRUE)[1:end])
unname(Vectorize(Word)(sentences, end = pmin(lengths(strsplit(sentences, " ")), 3)))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
3b) 和(2)一样的思路可以这样翻译成base R。 Word
来自 (3a).
sentences |>
paste("@ @ @") |>
Word(end = 3) |>
sub(pattern = " *@.*", replacement = "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"
更新
(1) 已简化,旧的 (1) 现在是 (2)。 (3a) 和 (3b) 现在是 Base R 对应项。