Quanteda - 提取已识别的字典单词
Quanteda - Extracting identified dictionary words
我正在尝试从 Quanteda dfm 中提取已识别的字典单词,但一直无法找到解决方案。
有人对此有解决方案吗?
示例输入:
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
dfm <- dfm("summer is great", dictionary = dict)
输出:
> dfm
Document-feature matrix of: 1 document, 1 feature.
1 x 1 sparse Matrix of class "dfmSparse"
features
docs season
text1 1
我现在知道在句子中识别了一个季节性词典词,但我还想知道它是哪个词。
最好以 table 格式提取:
docs dict dictWord
text1 season summer
您可以使用 keptFeatures
参数创建第二个 dfm,然后 cbind()
它到第一个字典 dfm。
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
txt <- "summer is great"
season_dfm <- dfm(txt, dictionary = dict, verbose = FALSE)
dict_dfm <- dfm(txt, select = dict, verbose = FALSE)
cbind(season_dfm, dict_dfm)
## Document-feature matrix of: 1 document, 2 features.
## 1 x 2 sparse Matrix of class "dfmSparse"
## season summer
## text1 1 1
如果您希望输出为 table,它将是:
dict_df <- as.data.frame(combined_dfm)
names(dict_df)[2] <- "dictWord"
dict_df
## season dictWord
## text1 1 1
但这只适用于每个文本只有一个字典值的情况——否则 dict_dfm
将有多个特征列。
我正在尝试从 Quanteda dfm 中提取已识别的字典单词,但一直无法找到解决方案。
有人对此有解决方案吗?
示例输入:
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
dfm <- dfm("summer is great", dictionary = dict)
输出:
> dfm
Document-feature matrix of: 1 document, 1 feature.
1 x 1 sparse Matrix of class "dfmSparse"
features
docs season
text1 1
我现在知道在句子中识别了一个季节性词典词,但我还想知道它是哪个词。
最好以 table 格式提取:
docs dict dictWord
text1 season summer
您可以使用 keptFeatures
参数创建第二个 dfm,然后 cbind()
它到第一个字典 dfm。
dict <- dictionary(list(season = c("spring", "summer", "fall", "winter")))
txt <- "summer is great"
season_dfm <- dfm(txt, dictionary = dict, verbose = FALSE)
dict_dfm <- dfm(txt, select = dict, verbose = FALSE)
cbind(season_dfm, dict_dfm)
## Document-feature matrix of: 1 document, 2 features.
## 1 x 2 sparse Matrix of class "dfmSparse"
## season summer
## text1 1 1
如果您希望输出为 table,它将是:
dict_df <- as.data.frame(combined_dfm)
names(dict_df)[2] <- "dictWord"
dict_df
## season dictWord
## text1 1 1
但这只适用于每个文本只有一个字典值的情况——否则 dict_dfm
将有多个特征列。