如何导入 XML-LMF 格式的词典以在 R 中进行情感分析
How to import a lexicon in XML-LMF format for sentiment analysis in R
我正在尝试在 R 中导入以下词典,以便与 quanteda
等文本挖掘包一起使用,或者将其导出为列表或数据框:
https://github.com/opener-project/VU-sentiment-lexicon/tree/master/VUSentimentLexicon/IT-lexicon
格式为XML-LMF。我找不到任何方法来用 R 解析这种格式。
(参见 https://en.wikipedia.org/wiki/Lexical_Markup_Framework)
作为解决方法,我尝试使用 XML
包,但结构与通常的 XML 有点不同,而且我没有设法解析所有节点。
我设法使用 xml2
包让它工作。这是我的代码:
library(xml2)
library(quanteda)
# Read file and find the nodes
opeNER_xml <- read_xml("it-sentiment_lexicon.lmf.xml")
entries <- xml_find_all(opeNER_xml, ".//LexicalEntry")
lemmas <- xml_find_all(opeNER_xml, ".//Lemma")
confidence <- xml_find_all(opeNER_xml, ".//Confidence")
sentiment <- xml_find_all(opeNER_xml, ".//Sentiment")
# Parse and put in a data frame
opeNER_df <- data.frame(
id = xml_attr(entries, "id"),
lemma = xml_attr(lemmas, "writtenForm"),
partOfSpeech = xml_attr(entries, "partOfSpeech"),
confidenceScore = as.numeric(xml_attr(confidence, "score")),
method = xml_attr(confidence, "method"),
polarity = as.character(xml_attr(sentiment, "polarity")),
stringsAsFactors = F
)
# Fix a mistake
opeNER_df$polarity <- ifelse(opeNER_df$polarity == "nneutral",
"neutral", opeNER_df$polarity)
# Make quanteda dictionary
opeNER_dict <- quanteda::dictionary(with(opeNER_df, split(lemma, polarity)))
我正在尝试在 R 中导入以下词典,以便与 quanteda
等文本挖掘包一起使用,或者将其导出为列表或数据框:
https://github.com/opener-project/VU-sentiment-lexicon/tree/master/VUSentimentLexicon/IT-lexicon
格式为XML-LMF。我找不到任何方法来用 R 解析这种格式。
(参见 https://en.wikipedia.org/wiki/Lexical_Markup_Framework)
作为解决方法,我尝试使用 XML
包,但结构与通常的 XML 有点不同,而且我没有设法解析所有节点。
我设法使用 xml2
包让它工作。这是我的代码:
library(xml2)
library(quanteda)
# Read file and find the nodes
opeNER_xml <- read_xml("it-sentiment_lexicon.lmf.xml")
entries <- xml_find_all(opeNER_xml, ".//LexicalEntry")
lemmas <- xml_find_all(opeNER_xml, ".//Lemma")
confidence <- xml_find_all(opeNER_xml, ".//Confidence")
sentiment <- xml_find_all(opeNER_xml, ".//Sentiment")
# Parse and put in a data frame
opeNER_df <- data.frame(
id = xml_attr(entries, "id"),
lemma = xml_attr(lemmas, "writtenForm"),
partOfSpeech = xml_attr(entries, "partOfSpeech"),
confidenceScore = as.numeric(xml_attr(confidence, "score")),
method = xml_attr(confidence, "method"),
polarity = as.character(xml_attr(sentiment, "polarity")),
stringsAsFactors = F
)
# Fix a mistake
opeNER_df$polarity <- ifelse(opeNER_df$polarity == "nneutral",
"neutral", opeNER_df$polarity)
# Make quanteda dictionary
opeNER_dict <- quanteda::dictionary(with(opeNER_df, split(lemma, polarity)))