将 json 对象反规范化为平面对象
De-normalize json object into flat objects
我有一个 json 对象
{
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
我想将其展平,以便所需的输出看起来像
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00", "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}
到目前为止我已经尝试过:
from pandas.io.json import json_normalize
json_normalize(sample_object)
其中 sample_object
包含 json
对象,我正在遍历一个包含此类对象的大文件,我想以所需的格式展平这些对象。
json_normalize
没有给我想要的输出,我想保持标签不变但展平 props
并重复父对象信息。
请试试这个:
import copy
obj = {
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
props = obj.pop("props")
for p in props:
res = copy.deepcopy(obj)
for k in p:
res["props."+k] = p[k]
print(res)
基本上它使用 pop("props")
来获取没有 "props"
的对象(这是所有结果对象中使用的公共部分),
然后我们遍历道具,并创建包含基础对象的新对象,然后为每个道具中的每个键填充 "props.key"。
您想要一些 json_normalize
行为,但带有自定义的扭曲。因此,对一部分数据使用 json_normalize
或类似方法,然后将其与其余数据组合。
下面的代码更喜欢 "or similar" 路线,达到深度 into the pandas codebase 以获得 nested_to_record
辅助函数,它使字典变平。它用于创建单独的行,这些行将基础数据(keys/values 在所有属性中通用)与特定于每个 props 条目的扁平化数据相结合。有一个 commented-out 行在没有 nested_to_record
的情况下做同样的事情,但它有点不优雅地扁平化为 DataFrame
,然后导出到 dict
.
from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record
data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
rowdict = OrderedDict(data)
flattened_prop = nested_to_record({'props': prop})
# flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
rowdict.update(flattened_prop)
rows.append(rowdict)
df = pd.DataFrame(rows)
导致:
我有一个 json 对象
{
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
我想将其展平,以便所需的输出看起来像
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00", "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}
到目前为止我已经尝试过:
from pandas.io.json import json_normalize
json_normalize(sample_object)
其中 sample_object
包含 json
对象,我正在遍历一个包含此类对象的大文件,我想以所需的格式展平这些对象。
json_normalize
没有给我想要的输出,我想保持标签不变但展平 props
并重复父对象信息。
请试试这个:
import copy
obj = {
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
props = obj.pop("props")
for p in props:
res = copy.deepcopy(obj)
for k in p:
res["props."+k] = p[k]
print(res)
基本上它使用 pop("props")
来获取没有 "props"
的对象(这是所有结果对象中使用的公共部分),
然后我们遍历道具,并创建包含基础对象的新对象,然后为每个道具中的每个键填充 "props.key"。
您想要一些 json_normalize
行为,但带有自定义的扭曲。因此,对一部分数据使用 json_normalize
或类似方法,然后将其与其余数据组合。
下面的代码更喜欢 "or similar" 路线,达到深度 into the pandas codebase 以获得 nested_to_record
辅助函数,它使字典变平。它用于创建单独的行,这些行将基础数据(keys/values 在所有属性中通用)与特定于每个 props 条目的扁平化数据相结合。有一个 commented-out 行在没有 nested_to_record
的情况下做同样的事情,但它有点不优雅地扁平化为 DataFrame
,然后导出到 dict
.
from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record
data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
rowdict = OrderedDict(data)
flattened_prop = nested_to_record({'props': prop})
# flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
rowdict.update(flattened_prop)
rows.append(rowdict)
df = pd.DataFrame(rows)
导致: