给定文本中所有单词的字母编号,并按几个字母到多个字母排序
letter numbers of all words in a given text and sorting by few letters to many
我需要使用 tidyverse 中的例句并取 5 个样本。在获取了这 5 个样本之后,我需要一个函数来查找该样本中所有单词的字母编号,并根据这些数字对文本进行排序,从字母少的单词到字母多的单词。
您可以使用 stringr
包:
s <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
words <- unlist(stringr::str_extract_all(s, stringr::boundary("word")))
words[order(nchar(words))]
[1] "A" "A" "is" "in" "we" "on" "is" "of" "The" "The" "the" "big"
[13] "wet" "was" "the" "the" "lot" "the" "and" "see" "the" "the" "the" "worm"
[25] "gets" "sink" "pile" "cuts" "Peep" "tent" "Next" "first" "early" "thing" "which" "stain"
[37] "round" "fence" "under" "month" "dishes" "carpet" "corner" "clowns" "Sunday" "snapped" "through" "twelfth"
1。仅按单词长度排序
s <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
s_split <- s %>% str_extract_all(stringr::boundary("word")) %>% unlist()
s_split %>%
str_length() %>%
order() %>%
s_split[.] %>%
str_c(collapse = " ") %>%
str_to_lower()
[1] "a a is in we on is of the the the big wet was the the lot the and see the the the worm gets sink pile cuts peep tent next first early thing which stain round fence under month dishes carpet corner clowns sunday snapped through twelfth"
如果要分析多个字符串,使用函数:
order_by_length <- function(input) {
s_split <- input %>% str_extract_all(stringr::boundary("word")) %>% unlist()
s_split %>%
str_length() %>%
order() %>%
s_split[.] %>%
str_c(collapse = " ") %>%
str_to_lower()
}
string_1 <- "This is a test string"
string_2 <- "Here we have another sentence as an example"
string_3 <- "Let's demonstrate even a third string"
string_list <- list(string_1, string_2, string_3)
map(string_list, order_by_length)
[[1]]
[1] "a is this test string"
[[2]]
[1] "we as an here have another example sentence"
[[3]]
[1] "a even let's third string demonstrate"
2。先按长度排序,再按字母排序
使用 split()
按长度排序,使用 str_sort()
按字母顺序排序:
order_by_length2 <- function(input) {
input %>%
str_extract_all(stringr::boundary("word")) %>%
unlist() %>%
split(f=str_length(.)) %>%
map(str_sort) %>%
unlist(use.names = F) %>%
str_c(collapse = " ") %>%
str_to_lower()
}
# 1. One string
order_by_length2(s)
[1] "a a in is is of on we and big lot see the the the the the the the the the was wet cuts gets next peep pile sink tent worm early fence first month round stain thing under which carpet clowns corner dishes sunday snapped through twelfth"
# 2. Multiple strings
map(string_list, order_by_length2)
[[1]]
[1] "a is test this string"
[[2]]
[1] "an as we have here another example sentence"
[[3]]
[1] "a even let's third string demonstrate"
我需要使用 tidyverse 中的例句并取 5 个样本。在获取了这 5 个样本之后,我需要一个函数来查找该样本中所有单词的字母编号,并根据这些数字对文本进行排序,从字母少的单词到字母多的单词。
您可以使用 stringr
包:
s <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
words <- unlist(stringr::str_extract_all(s, stringr::boundary("word")))
words[order(nchar(words))]
[1] "A" "A" "is" "in" "we" "on" "is" "of" "The" "The" "the" "big"
[13] "wet" "was" "the" "the" "lot" "the" "and" "see" "the" "the" "the" "worm"
[25] "gets" "sink" "pile" "cuts" "Peep" "tent" "Next" "first" "early" "thing" "which" "stain"
[37] "round" "fence" "under" "month" "dishes" "carpet" "corner" "clowns" "Sunday" "snapped" "through" "twelfth"
1。仅按单词长度排序
s <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
s_split <- s %>% str_extract_all(stringr::boundary("word")) %>% unlist()
s_split %>%
str_length() %>%
order() %>%
s_split[.] %>%
str_c(collapse = " ") %>%
str_to_lower()
[1] "a a is in we on is of the the the big wet was the the lot the and see the the the worm gets sink pile cuts peep tent next first early thing which stain round fence under month dishes carpet corner clowns sunday snapped through twelfth"
如果要分析多个字符串,使用函数:
order_by_length <- function(input) {
s_split <- input %>% str_extract_all(stringr::boundary("word")) %>% unlist()
s_split %>%
str_length() %>%
order() %>%
s_split[.] %>%
str_c(collapse = " ") %>%
str_to_lower()
}
string_1 <- "This is a test string"
string_2 <- "Here we have another sentence as an example"
string_3 <- "Let's demonstrate even a third string"
string_list <- list(string_1, string_2, string_3)
map(string_list, order_by_length)
[[1]]
[1] "a is this test string"
[[2]]
[1] "we as an here have another example sentence"
[[3]]
[1] "a even let's third string demonstrate"
2。先按长度排序,再按字母排序
使用 split()
按长度排序,使用 str_sort()
按字母顺序排序:
order_by_length2 <- function(input) {
input %>%
str_extract_all(stringr::boundary("word")) %>%
unlist() %>%
split(f=str_length(.)) %>%
map(str_sort) %>%
unlist(use.names = F) %>%
str_c(collapse = " ") %>%
str_to_lower()
}
# 1. One string
order_by_length2(s)
[1] "a a in is is of on we and big lot see the the the the the the the the the was wet cuts gets next peep pile sink tent worm early fence first month round stain thing under which carpet clowns corner dishes sunday snapped through twelfth"
# 2. Multiple strings
map(string_list, order_by_length2)
[[1]]
[1] "a is test this string"
[[2]]
[1] "an as we have here another example sentence"
[[3]]
[1] "a even let's third string demonstrate"