打印带有删除形容词的 pos 标签 (NLTK)

Print pos tag with removed adjectives (NLTK)

abc = nltk.pos_tag(info)
      print(s for s in abc if s[1] != 'ADV')

Returns: 生成器对象位置。当地人>。 genexpr> 在 0x000000000E000D00>

如果使用 [] 圆形打印我得到 "Invalid syntax"

对于形容词,试试这个:

abc = nltk.pos_tag(info)
print [s for s in abc if s[1] != 'JJ']

我猜你只是想得到不是 "adverbs"?

的词性输出

使用括号会导致传递打印函数 generator comprehension。如果您只想一次输出所有内容,请尝试这样的操作(列表理解中的生成器):

print([s for s in abc if s[1] != 'ADV'])

注意:您也可以在不使用 print() 的情况下实现相同的输出。

此外,仅供参考:Last I checked "ADV" 不对应于 pos 标签。如果您想消除副词,那么我认为正确的 pos 标记副词类型是 "RB"、"RBR" 和 "RBS".

根据以下亚历克西斯的回复更新了答案。他是对的,解释不完整。粘贴他的评论反馈:

There's generators, and there's list comprehensions. print(s for s ...) passes print a generator; the version with square brackets uses the generator in a list comprehension, to make a list.

(也请为 alexis 的评论点赞)

来自https://github.com/nltk/nltk/issues/1783#issuecomment-317174189

The pos_tag() function is trained on Sections 00-18 of the Wall Street Journal sections of OntoNotes 5.

来自http://www.nltk.org/api/nltk.tag.html#module-nltk.tag

It uses the Penn TreeBank tagset https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

要捕捉所有副词,请检查 RB* 个标签。

使用列表理解,检查标签的前 2 个字符并检查 RB,例如

>>> from nltk import pos_tag, word_tokenize
>>> sent = "I am running quickly"
>>> [word for word, pos in pos_tag(word_tokenize(sent)) if pos.startswith('RB')]
['quickly']

要捕捉形容词,请检查 JJ* 标签:

>>> sent = "I am running quickly"
>>> sent = "The big red cat is redder than apple"
>>> [word for word, pos in pos_tag(word_tokenize(sent)) if pos.startswith('JJ')]
['big', 'red', 'redder']

如果您只检查 JJJJ*(即 .startswith('JJ')),您将错过比较级和最高级形容词:

>>> sent = "The big red cat is redder than apple, it's the best in the world"
>>> [word for word, pos in pos_tag(word_tokenize(sent)) if pos.startswith('JJ')]
['big', 'red', 'redder', 'best']
>>> [word for word, pos in pos_tag(word_tokenize(sent)) if pos == 'JJ' ]
['big', 'red']

删除只需使用 not:

>>> [word for word, pos in pos_tag(word_tokenize(sent)) if not pos.startswith('JJ')]
['The', 'cat', 'is', 'than', 'apple', ',', 'it', "'s", 'the', 'in', 'the', 'world']