将具有各种字段列表的 JSON 响应转换为 Pandas 数据帧
Converting JSON response with various list of fields into Pandas dataframe
我正在使用 python 的 Requests 库从 BestBuy Products API 下载一些数据,我想将它们存储到 pandas 数据帧中。
类似的东西:
results = requests.get(url1,
params={'paramStuff'},
headers={'User-Agent': ua})
products = json.loads(results.text)
我得到了很多带有服务信息的不同字段,因此我只针对 JSON 中我想要的特定字段:
products['products']
我有:
[{'details':[{'name': 'Name of Feature', 'value':'Value Of Feature'},
{'name': 'Name of Other Feature', 'value':'Value Of Other
Feature'}, ...],
'ProductId': 'Id Of Product 1',
'Some Other Field': 'Some Other Field Value'},
{same structure as above for other product}, {etc}]
如您所见,它类似于字典列表,而字典列表本身又包含字典列表。要突出显示 - 详细信息字典可以包含各种名称组合列表:值(不同产品的名称也不同)。
关于如何处理这种结构以进入具有这种格式的数据框的任何想法:
+-----------+-------------------+-------------------+-------------------+------------------+
| ProductID | Name of Feature 1 | Name of Feature 2 | Name Of Feature 3 | Some Other Field |
+-----------+-------------------+-------------------+-------------------+------------------+
| Product 1 | Value | NULL | Value | Value |
| Product 2 | NULL | Value | Value | Value |
+-----------+-------------------+-------------------+-------------------+------------------+
到目前为止,我只做到了这样的事情:
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| ProductID | Details | Some Other Field |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| Product 1 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 1 |
| Product 2 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 2 |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
好的,我最终开发了一种手动解析嵌套字段的方法。没有弄清楚是否有任何简单的方法。仅供参考,这用于解析 BestBuy Products API 的响应,以防有人发现它有用。
#first build the pandas DF shown in question
df = pd.io.json.json_normalize(products)
#fields which are not nested and not require parsing
fields = ['sku', 'name', 'regularPrice', 'manufacturer']
#nested field is called 'details', as mentioned can have a lot of different subfields
featureFields = []
#first build a list which will have all potential features from the nested field
for i in range(0,len(df)):
row = df.iloc[i]
for detail in row['details']:
featureFields.append(detail['name'].split('>', 1)[-1])
#make a list unique
featureFields = set(featureFields)
fields = set(fields)
#now we go over each record in dataframe and parse nested field to a dict
records = []
for i in range(0,len(df)):
row = df.iloc[i]
record = dict.fromkeys(fields)
record['name'] = row['name']
record['regularPrice'] = row['regularPrice']
record['manufacturer'] = row['manufacturer']
record['sku'] = row['sku']
for detail in row['details']:
record[detail['name'].split('>', 1)[-1]] = detail['value'].split('>', 1)[-1]
records.append(record)
#finally we have not nested list of dictionaries with records
dfFinal = pd.DataFrame.from_dict(records)
我正在使用 python 的 Requests 库从 BestBuy Products API 下载一些数据,我想将它们存储到 pandas 数据帧中。
类似的东西:
results = requests.get(url1,
params={'paramStuff'},
headers={'User-Agent': ua})
products = json.loads(results.text)
我得到了很多带有服务信息的不同字段,因此我只针对 JSON 中我想要的特定字段:
products['products']
我有:
[{'details':[{'name': 'Name of Feature', 'value':'Value Of Feature'},
{'name': 'Name of Other Feature', 'value':'Value Of Other
Feature'}, ...],
'ProductId': 'Id Of Product 1',
'Some Other Field': 'Some Other Field Value'},
{same structure as above for other product}, {etc}]
如您所见,它类似于字典列表,而字典列表本身又包含字典列表。要突出显示 - 详细信息字典可以包含各种名称组合列表:值(不同产品的名称也不同)。
关于如何处理这种结构以进入具有这种格式的数据框的任何想法:
+-----------+-------------------+-------------------+-------------------+------------------+
| ProductID | Name of Feature 1 | Name of Feature 2 | Name Of Feature 3 | Some Other Field |
+-----------+-------------------+-------------------+-------------------+------------------+
| Product 1 | Value | NULL | Value | Value |
| Product 2 | NULL | Value | Value | Value |
+-----------+-------------------+-------------------+-------------------+------------------+
到目前为止,我只做到了这样的事情:
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| ProductID | Details | Some Other Field |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| Product 1 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 1 |
| Product 2 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 2 |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
好的,我最终开发了一种手动解析嵌套字段的方法。没有弄清楚是否有任何简单的方法。仅供参考,这用于解析 BestBuy Products API 的响应,以防有人发现它有用。
#first build the pandas DF shown in question
df = pd.io.json.json_normalize(products)
#fields which are not nested and not require parsing
fields = ['sku', 'name', 'regularPrice', 'manufacturer']
#nested field is called 'details', as mentioned can have a lot of different subfields
featureFields = []
#first build a list which will have all potential features from the nested field
for i in range(0,len(df)):
row = df.iloc[i]
for detail in row['details']:
featureFields.append(detail['name'].split('>', 1)[-1])
#make a list unique
featureFields = set(featureFields)
fields = set(fields)
#now we go over each record in dataframe and parse nested field to a dict
records = []
for i in range(0,len(df)):
row = df.iloc[i]
record = dict.fromkeys(fields)
record['name'] = row['name']
record['regularPrice'] = row['regularPrice']
record['manufacturer'] = row['manufacturer']
record['sku'] = row['sku']
for detail in row['details']:
record[detail['name'].split('>', 1)[-1]] = detail['value'].split('>', 1)[-1]
records.append(record)
#finally we have not nested list of dictionaries with records
dfFinal = pd.DataFrame.from_dict(records)