从 Python 中的 JSON 中提取特定值
Extracting specific values from JSON in Python
我在 Python 中有一个 JSON 对象,它来自调用 API(使用 urllib2)生成的结果,如下所示:
results = urllib2.urlopen(req).read()
json1 = json.loads(results)
这会生成一个 JSON 对象,其中包含类似以下内容(由于大小而被截断):
"http://d.opencalais.com/dochash-1/895ba8ff-4c32-3ae1-9615-9a9a9a1bcb39/cat/1":{
"_typeGroup":"topics",
"category":"http://d.opencalais.com/cat/Calais/Entertainment_Culture",
"classifierName":"Calais",
"categoryName":"Entertainment_Culture",
"score":1
},
"http://d.opencalais.com/genericHasher-1/b6a2d07d-133b-35ad-85e2-54d524e750cf":{
"_typeGroup":"entities",
"_type":"TVShow",
"name":"Hard Knocks",
"_typeReference":"http://s.opencalais.com/1/type/em/e/TVShow",
"instances":[
{
"detection":"[ New York Jets during the summer of 2010 on HBO's ]Hard Knocks[.\n]",
"prefix":" New York Jets during the summer of 2010 on HBO's ",
"exact":"Hard Knocks",
"suffix":".\n",
"offset":135,
"length":11
}
],
"relevance":0.5
},
"http://d.opencalais.com/genericHasher-1/802a1ebb-7fac-354f-b02f-6ef8442950d3":{
"_typeGroup":"entities",
"_type":"Organization",
"name":"New York Jets",
"organizationtype":"sports",
"nationality":"American",
"_typeReference":"http://s.opencalais.com/1/type/em/e/Organization",
"instances":[
{
"detection":"[ Tebow caught a few training camp glimpses of the ]New York Jets[ during the summer of 2010 on HBO's Hard]",
"prefix":" Tebow caught a few training camp glimpses of the ",
"exact":"New York Jets",
"suffix":" during the summer of 2010 on HBO's Hard",
"offset":86,
"length":13
}
],
"relevance":0.5
}
从这个 JSON,我想提取“_type”和 "name" 只有当 "typeGroup" == "entities".
例如,对于上面的 JSON 对象,输出应该如下所示:
TVShow: Hard Knocks
Organization: New York Jets.
有人可以在 Python 中帮助解决这个问题吗?
[更新 1]
根据 Jatin 的回答,我尝试了以下方法:
for key,value in json1.items():
if value["_typeGroup"] == "entities":
print value['_type'], value['name']
但是,这会导致错误 KeyError: '_typeGroup'
我试着看看键和值是如何打印的,如下所示:
for key,value in json1.items():
print key,value
这导致了以下输出(仅显示一个键值对):
http://d.opencalais.com/genericHasher-1/802a1ebb-7fac-354f-b02f-6ef8442950d3 {u'_typeReference': u'http://s.opencalais.com/1/type/em/e/Organization', u'_type': u'Organization', u'name': u'New York Jets', u'_typeGroup': u'entities', u'instances': [{u'suffix': u" during the summer of 2010 on HBO's Hard", u'prefix': u' Tebow caught a few training camp glimpses of the ', u'detection': u"[ Tebow caught a few training camp glimpses of the ]New York Jets[ during the summer of 2010 on HBO's Hard]", u'length': 13, u'offset': 86, u'exact': u'New York Jets'}], u'relevance': 0.5, u'nationality': u'American', u'organizationtype': u'sports'}
它似乎是一个嵌套 JSON。所以我尝试了以下方法来访问内部键值对,如下所示:
for key,value in json1.items():
val1 = value
for key,value in val1.items():
if value["_typeGroup"] == "entities":
print value['_type'], value['name']
但是,它抛出以下错误:
TypeError: string indices must be integers
for key,value in json1.items():
if value.get('typeGroup') == "entities":
print value.get('_type'), value.get('name')
试试这个,然后告诉我。它应该工作。
我认为您收到该错误是因为 JSON 中的某些值没有 _typeGroup
。试试这个:
for key,value in x.items():
if value.get("_typeGroup", "") == "entities":
print value['_type'], value['name']
我在 Python 中有一个 JSON 对象,它来自调用 API(使用 urllib2)生成的结果,如下所示:
results = urllib2.urlopen(req).read()
json1 = json.loads(results)
这会生成一个 JSON 对象,其中包含类似以下内容(由于大小而被截断):
"http://d.opencalais.com/dochash-1/895ba8ff-4c32-3ae1-9615-9a9a9a1bcb39/cat/1":{
"_typeGroup":"topics",
"category":"http://d.opencalais.com/cat/Calais/Entertainment_Culture",
"classifierName":"Calais",
"categoryName":"Entertainment_Culture",
"score":1
},
"http://d.opencalais.com/genericHasher-1/b6a2d07d-133b-35ad-85e2-54d524e750cf":{
"_typeGroup":"entities",
"_type":"TVShow",
"name":"Hard Knocks",
"_typeReference":"http://s.opencalais.com/1/type/em/e/TVShow",
"instances":[
{
"detection":"[ New York Jets during the summer of 2010 on HBO's ]Hard Knocks[.\n]",
"prefix":" New York Jets during the summer of 2010 on HBO's ",
"exact":"Hard Knocks",
"suffix":".\n",
"offset":135,
"length":11
}
],
"relevance":0.5
},
"http://d.opencalais.com/genericHasher-1/802a1ebb-7fac-354f-b02f-6ef8442950d3":{
"_typeGroup":"entities",
"_type":"Organization",
"name":"New York Jets",
"organizationtype":"sports",
"nationality":"American",
"_typeReference":"http://s.opencalais.com/1/type/em/e/Organization",
"instances":[
{
"detection":"[ Tebow caught a few training camp glimpses of the ]New York Jets[ during the summer of 2010 on HBO's Hard]",
"prefix":" Tebow caught a few training camp glimpses of the ",
"exact":"New York Jets",
"suffix":" during the summer of 2010 on HBO's Hard",
"offset":86,
"length":13
}
],
"relevance":0.5
}
从这个 JSON,我想提取“_type”和 "name" 只有当 "typeGroup" == "entities".
例如,对于上面的 JSON 对象,输出应该如下所示:
TVShow: Hard Knocks
Organization: New York Jets.
有人可以在 Python 中帮助解决这个问题吗?
[更新 1]
根据 Jatin 的回答,我尝试了以下方法:
for key,value in json1.items():
if value["_typeGroup"] == "entities":
print value['_type'], value['name']
但是,这会导致错误 KeyError: '_typeGroup'
我试着看看键和值是如何打印的,如下所示:
for key,value in json1.items():
print key,value
这导致了以下输出(仅显示一个键值对):
http://d.opencalais.com/genericHasher-1/802a1ebb-7fac-354f-b02f-6ef8442950d3 {u'_typeReference': u'http://s.opencalais.com/1/type/em/e/Organization', u'_type': u'Organization', u'name': u'New York Jets', u'_typeGroup': u'entities', u'instances': [{u'suffix': u" during the summer of 2010 on HBO's Hard", u'prefix': u' Tebow caught a few training camp glimpses of the ', u'detection': u"[ Tebow caught a few training camp glimpses of the ]New York Jets[ during the summer of 2010 on HBO's Hard]", u'length': 13, u'offset': 86, u'exact': u'New York Jets'}], u'relevance': 0.5, u'nationality': u'American', u'organizationtype': u'sports'}
它似乎是一个嵌套 JSON。所以我尝试了以下方法来访问内部键值对,如下所示:
for key,value in json1.items():
val1 = value
for key,value in val1.items():
if value["_typeGroup"] == "entities":
print value['_type'], value['name']
但是,它抛出以下错误:
TypeError: string indices must be integers
for key,value in json1.items():
if value.get('typeGroup') == "entities":
print value.get('_type'), value.get('name')
试试这个,然后告诉我。它应该工作。
我认为您收到该错误是因为 JSON 中的某些值没有 _typeGroup
。试试这个:
for key,value in x.items():
if value.get("_typeGroup", "") == "entities":
print value['_type'], value['name']