使用来自 json 文件的 nltk 分隔名词和名词标签组

Question

我想使用 NLTK 从 JSON 文件中查找或分离名词和名词组，这是 JSON 文件内容：

[
  {
    "id": 18009,
    "ingredients": [
      "baking powder",
      "eggs",
      "all-purpose flour",
      "raisins",
      "milk",
      "white sugar"
    ]
  },
  {
    "id": 28583,
    "ingredients": [
      "sugar",
      "egg yolks",
      "corn starch",
      "cream of tartar",
      "bananas",
      "vanilla wafers",
      "milk",
      "vanilla extract",
      "toasted pecans",
      "egg whites",
      "light rum"
    ]
  },

我想找到 NN、NNS、NNP、NNPS。

Answer 1

import nltk
from nltk import word_tokenize
for a in data:
    for b in a["ingredients"]:
        text = word_tokenize(b)
        res = nltk.pos_tag(text)
        res = [t for t in res if t[1] in ["NN", "NNS", "NNP", "NNPS"]]
        print(res)

#output:
#[('powder', 'NN')]
#[('eggs', 'NNS')]
#[('flour', 'NN')]
#[('raisins', 'NNS')]
#[('milk', 'NN')]
#[('sugar', 'NN')]
#[('sugar', 'NN')]
#[('egg', 'NN'), ('yolks', 'NNS')]
#[('corn', 'NN'), ('starch', 'NN')]
# ...

使用来自 json 文件的 nltk 分隔名词和名词标签组

Separate of nouns and groups of noun tag using nltk from json file

nltk

part-of-speech