如何在 str_replace_all 和 hunspell_suggest 上使用 lapply 来替换所有拼写错误的单词?
How to using lapply on str_replace_all and hunspell_suggest to replace all misspelled words?
我正在尝试弄清楚如何将 str_replace_all
和 hunspell_suggest
合并到 lapply 中。这是我目前的情况:
我有一个如下所示的数据框:
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"))
以下是我用于识别列中拼写错误的单词的代码:
df1$Text <- as.character(df1$Text)
df1$word_check <- hunspell(df1$Text)
但是,在使用 hunspell_suggest
的第一个建议替换拼写错误的单词时,我遇到了困难
我尝试了以下代码,但它只能执行 1 行,并且只能处理具有 1 个拼写错误单词的行:
df1$replace <- str_replace_all(df1$Text, df1$word_check[[1]], hunspell_suggest(df1$word_check[[1]])[[1]][1])
我不确定如何将 lapply
合并到上面的代码中,以使用基于 hunspell_suggest
的第一个建议有效地替换所有拼写错误的单词,并保留那些正确的单词。
谢谢。
这是使用 DataCombine
包的一种解决方案:
library(DataCombine)
# vector of words to replace
wrong <- unlist(hunspell(df1$Text))
# vector of the first suggested words
correct <- sapply(wrong, function(x) hunspell_suggest(x)[[1]][1])
Replaces <- data.frame(from = wrong, to = correct)
FindReplace(data = df1, Var = "Text", replaceData = Replaces,
from = "from", to = "to", exact = FALSE)
#Index Text
#1 1 Brad came to dinner with us tonight.
#2 2 Wald you like to trace with me?
#3 3 There is so hum to understand.
#4 4 Sentences cone in many shes and sizes.
#5 5 Learning R is fun
#6 6 yesterday was Friday
#7 7 bung search engine
虽然这个案子现在已经解决了,但让我给你留下另一个选择。您尝试使用 str_replace_all()
。我改用 stri_replace_all_fixed()
。第一步是识别坏词并将它们存储在 badwords
中。第二步是使用 sapply()
中的 hunspell_suggest()
为每个单词提取第一个建议,并将它们存储在 suggestions
中。最后,我在 stri_replace_all_fixed()
.
中使用了这两个向量
library(dplyr)
library(stringi)
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"),
stringsAsFactors = FALSE)
# Get bad words.
badwords <- hunspell(df1$Text) %>% unlist
# Extract the first suggestion for each bad word.
suggestions <- sapply(hunspell_suggest(badwords), "[[", 1)
mutate(df1, Text = stri_replace_all_fixed(str = Text,
pattern = badwords,
replacement = suggestions,
vectorize_all = FALSE)) -> out
# Index Text
#1 1 Brad came to dinner with us tonight.
#2 2 Wald you like to trace with me?
#3 3 There is so hum to understand.
#4 4 Sentences cone in many shes and sizes.
#5 5 Learning R is fun
#6 6 yesterday was Friday
#7 7 bung search engine
我正在尝试弄清楚如何将 str_replace_all
和 hunspell_suggest
合并到 lapply 中。这是我目前的情况:
我有一个如下所示的数据框:
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"))
以下是我用于识别列中拼写错误的单词的代码:
df1$Text <- as.character(df1$Text)
df1$word_check <- hunspell(df1$Text)
但是,在使用 hunspell_suggest
我尝试了以下代码,但它只能执行 1 行,并且只能处理具有 1 个拼写错误单词的行:
df1$replace <- str_replace_all(df1$Text, df1$word_check[[1]], hunspell_suggest(df1$word_check[[1]])[[1]][1])
我不确定如何将 lapply
合并到上面的代码中,以使用基于 hunspell_suggest
的第一个建议有效地替换所有拼写错误的单词,并保留那些正确的单词。
谢谢。
这是使用 DataCombine
包的一种解决方案:
library(DataCombine)
# vector of words to replace
wrong <- unlist(hunspell(df1$Text))
# vector of the first suggested words
correct <- sapply(wrong, function(x) hunspell_suggest(x)[[1]][1])
Replaces <- data.frame(from = wrong, to = correct)
FindReplace(data = df1, Var = "Text", replaceData = Replaces,
from = "from", to = "to", exact = FALSE)
#Index Text
#1 1 Brad came to dinner with us tonight.
#2 2 Wald you like to trace with me?
#3 3 There is so hum to understand.
#4 4 Sentences cone in many shes and sizes.
#5 5 Learning R is fun
#6 6 yesterday was Friday
#7 7 bung search engine
虽然这个案子现在已经解决了,但让我给你留下另一个选择。您尝试使用 str_replace_all()
。我改用 stri_replace_all_fixed()
。第一步是识别坏词并将它们存储在 badwords
中。第二步是使用 sapply()
中的 hunspell_suggest()
为每个单词提取第一个建议,并将它们存储在 suggestions
中。最后,我在 stri_replace_all_fixed()
.
library(dplyr)
library(stringi)
library(hunspell)
df1 <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"),
stringsAsFactors = FALSE)
# Get bad words.
badwords <- hunspell(df1$Text) %>% unlist
# Extract the first suggestion for each bad word.
suggestions <- sapply(hunspell_suggest(badwords), "[[", 1)
mutate(df1, Text = stri_replace_all_fixed(str = Text,
pattern = badwords,
replacement = suggestions,
vectorize_all = FALSE)) -> out
# Index Text
#1 1 Brad came to dinner with us tonight.
#2 2 Wald you like to trace with me?
#3 3 There is so hum to understand.
#4 4 Sentences cone in many shes and sizes.
#5 5 Learning R is fun
#6 6 yesterday was Friday
#7 7 bung search engine