从 R 中的文本中提取关键字
Extracting Keywords from text in R
我想从 R 中的文本中提取与保险服务相关的关键字。我创建了关键字列表并使用了 qdap
库中的常用函数。
bag <- bag_o_words(corpus)
b <- common(bag,keywords,overlap="all")
但结果只是出现频率大于1的常用词。
我也用过 RKEA
库。
keywords <- c("directasia", "directasia.com", "Frank", "frank", "OCBC", "NTUC",
"NTUC Income", "Frank by OCBC", "customer service", "atm",
"insurance", "claim", "agent", "premium", "policy", "customer care",
"customer", "draft", "account", "credit", "savings","debit","ivr",
"offer", "transacation", "banking", "website", "mobile", "i-safe",
"customer", "demat", "network", "phone", "interest", "loan",
"transfer", "deposit", "otp", "rewards", "redemption")
tmpdir <- tempfile()
dir.create(tmpdir)
model <- file.path(tmpdir, "crudeModel")
createModel(corpus,keywords,model)
extractKeywords(corpus, model)
但是我收到以下错误
Error in createModel(corpus, keywords, model) : number of documents and keywords does not match
和
Error in .jcall(ke, "V", "extractKeyphrases", .jcall(ke,Ljava/util/Hashtable;", : java.io.FileNotFoundException: C:\Users\Bitanshu\AppData\Local\Temp\RtmpEHu9uA\file14c4160f41c2\crudeModel (The system cannot find the file specified)
第二个错误我觉得是因为createModel
没有成功
任何人都可以建议如何纠正这个或替代方法吗?
文本数据已从推特中提取。
您可以尝试 quanteda 软件包。我建议获取 GitHub 版本而不是 CRAN 版本,因为就在两天前我对 kwic()
功能进行了大修。示例:
> require(quanteda)
> kwic(inaugTexts, "asia")
contextPre keyword contextPost
[1841-Harrison, 8599] or Egypt and the lesser Asia would furnish the larger dividend
[1909-Taft, 1872] our shores from Europe and Asia of course reduces the necessity
[1925-Coolidge, 2215] differences in both Europe and Asia . But there is a
[1953-Eisenhower, 325] the earth. Masses of Asia have awakened to strike off
[2013-Obama, 1514] We will support democracy from Asia to Africa, from the
您应该使用以下格式创建模型,即使您不打算使用所有部分,也需要提及它们
createModel(语料库、关键词、模型、voc = "none"、vocformat = "")
我想从 R 中的文本中提取与保险服务相关的关键字。我创建了关键字列表并使用了 qdap
库中的常用函数。
bag <- bag_o_words(corpus)
b <- common(bag,keywords,overlap="all")
但结果只是出现频率大于1的常用词。
我也用过 RKEA
库。
keywords <- c("directasia", "directasia.com", "Frank", "frank", "OCBC", "NTUC",
"NTUC Income", "Frank by OCBC", "customer service", "atm",
"insurance", "claim", "agent", "premium", "policy", "customer care",
"customer", "draft", "account", "credit", "savings","debit","ivr",
"offer", "transacation", "banking", "website", "mobile", "i-safe",
"customer", "demat", "network", "phone", "interest", "loan",
"transfer", "deposit", "otp", "rewards", "redemption")
tmpdir <- tempfile()
dir.create(tmpdir)
model <- file.path(tmpdir, "crudeModel")
createModel(corpus,keywords,model)
extractKeywords(corpus, model)
但是我收到以下错误
Error in createModel(corpus, keywords, model) : number of documents and keywords does not match
和
Error in .jcall(ke, "V", "extractKeyphrases", .jcall(ke,Ljava/util/Hashtable;", : java.io.FileNotFoundException: C:\Users\Bitanshu\AppData\Local\Temp\RtmpEHu9uA\file14c4160f41c2\crudeModel (The system cannot find the file specified)
第二个错误我觉得是因为createModel
没有成功
任何人都可以建议如何纠正这个或替代方法吗? 文本数据已从推特中提取。
您可以尝试 quanteda 软件包。我建议获取 GitHub 版本而不是 CRAN 版本,因为就在两天前我对 kwic()
功能进行了大修。示例:
> require(quanteda)
> kwic(inaugTexts, "asia")
contextPre keyword contextPost
[1841-Harrison, 8599] or Egypt and the lesser Asia would furnish the larger dividend
[1909-Taft, 1872] our shores from Europe and Asia of course reduces the necessity
[1925-Coolidge, 2215] differences in both Europe and Asia . But there is a
[1953-Eisenhower, 325] the earth. Masses of Asia have awakened to strike off
[2013-Obama, 1514] We will support democracy from Asia to Africa, from the
您应该使用以下格式创建模型,即使您不打算使用所有部分,也需要提及它们
createModel(语料库、关键词、模型、voc = "none"、vocformat = "")