使用 POS 标记确定句子的时间性
determine the temporality of a sentence with POS tagging
我想从一系列句子中判断一个动作是否已经执行。
例如:
"I will prescribe this medication"
与 "I prescribed this medication"
或 "He had already taken the stuff"
与 "he may take the stuff later"
我正在尝试 tidytext
方法并决定简单地查找过去分词与未来分词动词。但是,当我使用 POS 标签时,我得到的动词类型只有 "Verb intransitive"
、"Verb (usu participle)"
和 "Verb (transitive)"
。我怎样才能知道过去或将来的动词,或者我可以使用另一个词性标注器?
我很想使用 tidytext
,因为我无法安装其他一些文本挖掘包使用的 rjava
。
查看udpipe
注释中的形态特征。这些放在注释的 feats 列中。您可以使用 cbind_morphological
将这些作为额外的列放入数据集中。
所有功能都在 https://universaldependencies.org/u/feat/index.html 中定义
您将在下面看到句子 'I prescribed this medication' 中规定的是 过去时 以及从 'he had already taken'.
中获取和拥有的词
library(udpipe)
x <- data.frame(doc_id = 1:4,
text = c("I will prescribe this medication",
"I prescribed this medication",
"He had already taken the stuff",
"he may take the stuff later"),
stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)
anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]
doc_id token lemma feats morph_verbform morph_tense
1 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
1 will will VerbForm=Fin Fin <NA>
1 prescribe prescribe VerbForm=Inf Inf <NA>
1 this this Number=Sing|PronType=Dem <NA> <NA>
1 medication medication Number=Sing <NA> <NA>
2 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
2 prescribed prescribe Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
2 this this Number=Sing|PronType=Dem <NA> <NA>
2 medication medication Number=Sing <NA> <NA>
3 He he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
3 had have Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
3 already already <NA> <NA> <NA>
3 taken take Tense=Past|VerbForm=Part Part Past
3 the the Definite=Def|PronType=Art <NA> <NA>
3 stuff stuff Number=Sing <NA> <NA>
4 he he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
4 may may VerbForm=Fin Fin <NA>
4 take take VerbForm=Inf Inf <NA>
4 the the Definite=Def|PronType=Art <NA> <NA>
4 stuff stuff Number=Sing <NA> <NA>
4 later later <NA> <NA> <NA>
我想从一系列句子中判断一个动作是否已经执行。
例如:
"I will prescribe this medication"
与 "I prescribed this medication"
或 "He had already taken the stuff"
与 "he may take the stuff later"
我正在尝试 tidytext
方法并决定简单地查找过去分词与未来分词动词。但是,当我使用 POS 标签时,我得到的动词类型只有 "Verb intransitive"
、"Verb (usu participle)"
和 "Verb (transitive)"
。我怎样才能知道过去或将来的动词,或者我可以使用另一个词性标注器?
我很想使用 tidytext
,因为我无法安装其他一些文本挖掘包使用的 rjava
。
查看udpipe
注释中的形态特征。这些放在注释的 feats 列中。您可以使用 cbind_morphological
将这些作为额外的列放入数据集中。
所有功能都在 https://universaldependencies.org/u/feat/index.html 中定义
您将在下面看到句子 'I prescribed this medication' 中规定的是 过去时 以及从 'he had already taken'.
library(udpipe)
x <- data.frame(doc_id = 1:4,
text = c("I will prescribe this medication",
"I prescribed this medication",
"He had already taken the stuff",
"he may take the stuff later"),
stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)
anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]
doc_id token lemma feats morph_verbform morph_tense
1 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
1 will will VerbForm=Fin Fin <NA>
1 prescribe prescribe VerbForm=Inf Inf <NA>
1 this this Number=Sing|PronType=Dem <NA> <NA>
1 medication medication Number=Sing <NA> <NA>
2 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
2 prescribed prescribe Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
2 this this Number=Sing|PronType=Dem <NA> <NA>
2 medication medication Number=Sing <NA> <NA>
3 He he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
3 had have Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
3 already already <NA> <NA> <NA>
3 taken take Tense=Past|VerbForm=Part Part Past
3 the the Definite=Def|PronType=Art <NA> <NA>
3 stuff stuff Number=Sing <NA> <NA>
4 he he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
4 may may VerbForm=Fin Fin <NA>
4 take take VerbForm=Inf Inf <NA>
4 the the Definite=Def|PronType=Art <NA> <NA>
4 stuff stuff Number=Sing <NA> <NA>
4 later later <NA> <NA> <NA>