hunspell包中添加词汇的方法是什么
what is the method to add vocabulary in hunspell pakckage
我有一个单词列表,我想使用 unspell 更正
但是在这些词中,可能有一些特定的词是 hunspell 不知道的,他必须不更正(列表没有定义,太长了,无法手动添加)
我可以用什么方法来解决?
我已经尝试查找和升级词典
这里是单词列表:
keywords<-c("Millimeter", "OMT", "Chooz",
"DCTPC", "JEM" "EUSO"
"EUSO", "EUSO" "PDM"
"FPGA", "Chooz" "Cepheids"
"Circumstellar","Tokamak" "ASIC"
"TiSAFT", "CoRoT" "Unes"
"Radioastronomy" ,"Coronagraphy", "Fiber",
"Ultrastable" ,"Puslsar" "Magnetohydrodynamic",
"KSZ", "Gaussianity", "Raman",
"Gravimetry", "Casimir" "transfert"
"TES", "MEMS", "CMB",
"CMB" ,"TES" "Blazar"
"modeling","DFB" "linewidth"
"Asteroseismology","ExPRES", "NDA",
"rephasing", "Nulling", "Gyroscop",
"Atmopsheric","fibers", "Spectroscopie",
"d'absorption","Calculs", "Aluminum",
"Transneptunian","Planetology", "Ultrastable",
像 transfert 或 d'absorption 这样的拼写真的很糟糕,但其他都是特殊的单词或字谜
这是代码:
bad_matrix<-sapply(keywords,FUN = function(x){hunspell(x,dict=dict_lang)})
bad_index=sapply(1:dim(bad_matrix)[1],FUN =function(x){length(bad_matrix[[x]])!=0})
使用 dictionary()
和 add_words
参数 -
library("hunspell")
keywords<-c("Millimeter", "OMT","Chooz")
words <- c("OMT", "wiskey")
correct_pkg <- hunspell_check(words)
correct_custom <- hunspell_check(words, dict = dictionary("en_US", add_words=keywords))
correct_pkg
correct_custom
输出
> correct_pkg
[1] FALSE FALSE
> correct_custom
[1] TRUE FALSE
请注意在第二种情况下 "OMT"
是如何被接受为一个词的。
我有一个单词列表,我想使用 unspell 更正 但是在这些词中,可能有一些特定的词是 hunspell 不知道的,他必须不更正(列表没有定义,太长了,无法手动添加)
我可以用什么方法来解决?
我已经尝试查找和升级词典
这里是单词列表:
keywords<-c("Millimeter", "OMT", "Chooz",
"DCTPC", "JEM" "EUSO"
"EUSO", "EUSO" "PDM"
"FPGA", "Chooz" "Cepheids"
"Circumstellar","Tokamak" "ASIC"
"TiSAFT", "CoRoT" "Unes"
"Radioastronomy" ,"Coronagraphy", "Fiber",
"Ultrastable" ,"Puslsar" "Magnetohydrodynamic",
"KSZ", "Gaussianity", "Raman",
"Gravimetry", "Casimir" "transfert"
"TES", "MEMS", "CMB",
"CMB" ,"TES" "Blazar"
"modeling","DFB" "linewidth"
"Asteroseismology","ExPRES", "NDA",
"rephasing", "Nulling", "Gyroscop",
"Atmopsheric","fibers", "Spectroscopie",
"d'absorption","Calculs", "Aluminum",
"Transneptunian","Planetology", "Ultrastable",
像 transfert 或 d'absorption 这样的拼写真的很糟糕,但其他都是特殊的单词或字谜 这是代码:
bad_matrix<-sapply(keywords,FUN = function(x){hunspell(x,dict=dict_lang)})
bad_index=sapply(1:dim(bad_matrix)[1],FUN =function(x){length(bad_matrix[[x]])!=0})
使用 dictionary()
和 add_words
参数 -
library("hunspell")
keywords<-c("Millimeter", "OMT","Chooz")
words <- c("OMT", "wiskey")
correct_pkg <- hunspell_check(words)
correct_custom <- hunspell_check(words, dict = dictionary("en_US", add_words=keywords))
correct_pkg
correct_custom
输出
> correct_pkg
[1] FALSE FALSE
> correct_custom
[1] TRUE FALSE
请注意在第二种情况下 "OMT"
是如何被接受为一个词的。