使用来自 json 文件的 nltk 分隔名词和名词标签组
Separate of nouns and groups of noun tag using nltk from json file
我想使用 NLTK 从 JSON 文件中查找或分离名词和名词组,这是 JSON 文件内容:
[
{
"id": 18009,
"ingredients": [
"baking powder",
"eggs",
"all-purpose flour",
"raisins",
"milk",
"white sugar"
]
},
{
"id": 28583,
"ingredients": [
"sugar",
"egg yolks",
"corn starch",
"cream of tartar",
"bananas",
"vanilla wafers",
"milk",
"vanilla extract",
"toasted pecans",
"egg whites",
"light rum"
]
},
我想找到 NN
、NNS
、NNP
、NNPS
。
import nltk
from nltk import word_tokenize
for a in data:
for b in a["ingredients"]:
text = word_tokenize(b)
res = nltk.pos_tag(text)
res = [t for t in res if t[1] in ["NN", "NNS", "NNP", "NNPS"]]
print(res)
#output:
#[('powder', 'NN')]
#[('eggs', 'NNS')]
#[('flour', 'NN')]
#[('raisins', 'NNS')]
#[('milk', 'NN')]
#[('sugar', 'NN')]
#[('sugar', 'NN')]
#[('egg', 'NN'), ('yolks', 'NNS')]
#[('corn', 'NN'), ('starch', 'NN')]
# ...
我想使用 NLTK 从 JSON 文件中查找或分离名词和名词组,这是 JSON 文件内容:
[
{
"id": 18009,
"ingredients": [
"baking powder",
"eggs",
"all-purpose flour",
"raisins",
"milk",
"white sugar"
]
},
{
"id": 28583,
"ingredients": [
"sugar",
"egg yolks",
"corn starch",
"cream of tartar",
"bananas",
"vanilla wafers",
"milk",
"vanilla extract",
"toasted pecans",
"egg whites",
"light rum"
]
},
我想找到 NN
、NNS
、NNP
、NNPS
。
import nltk
from nltk import word_tokenize
for a in data:
for b in a["ingredients"]:
text = word_tokenize(b)
res = nltk.pos_tag(text)
res = [t for t in res if t[1] in ["NN", "NNS", "NNP", "NNPS"]]
print(res)
#output:
#[('powder', 'NN')]
#[('eggs', 'NNS')]
#[('flour', 'NN')]
#[('raisins', 'NNS')]
#[('milk', 'NN')]
#[('sugar', 'NN')]
#[('sugar', 'NN')]
#[('egg', 'NN'), ('yolks', 'NNS')]
#[('corn', 'NN'), ('starch', 'NN')]
# ...