从 mongodb 集合嵌套数组生成一个新的 Dataframe
generate a new Dataframe from mongodb collection nested array
我正在尝试从 mongodb 集合生成一个新的数据框,目标是制作一个仅代表 'events' 列的新 df:
例如:
{
"_id" : 1641008579,
"status" : "init",
"description" : "Test",
"attachment" : null,
"start" : "08:00",
"user" : "Jenny",
"timestamp" : ISODate("2022-01-01T04:43:11.380Z"),
"events" : [
{
"id" : 1641008580,
"status" : "start",
"description" : "First Event",
"user" : "Jenny",
"timestamp" : ISODate("2022-01-01T04:43:11.380Z")
},
{
"id" : 1641008581,
"status" : "progress",
"description" : "Middle of the Event",
"user" : "Joe",
"timestamp" : ISODate("2022-01-01T05:43:11.380Z")
},
{
"id" : 1641008582,
"status" : "end",
"description" : "Last Event",
"user" : "Alain",
"timestamp" : ISODate("2022-01-01T06:43:11.380Z")
}
]
}
知道如何开始一种方法才能获得以下结果吗?
event_df 应该像下面这样:
id status description user timestamp
0 1641008580 start First Event Jenny "2022-01-01T04:43:11.380Z"
1 1641008581 progress Middle of the Event Joe "2022-01-01T05:43:11.380Z"
2 1641008582 end Last Event Alain "2022-01-01T06:43:11.380Z"
/K
Pandas' pandas.json_normalize
方法在这里非常有效,它将“将半结构化 JSON 数据规范化为平面 table。”返回 DataFrame
.
API 参考 -> pandas.json_normalize
import json
import pandas as pd
with open('mongo.json') as json_file: # retrieve the json file
data = json.load(json_file) # deserialize the json file to a dict
events_df = pd.json_normalize(data['events']) # normalize and create a dataframe
print(events_df)
这是加载集合后的函数:
def set_event_2_df(last_situation):
for doc in last_situation:
for k, v in doc.items():
try:
if k == 'events':
for i, e in enumerate(doc['events']):
new_row = {
'id': str(doc['events'][i]['id']),
'status': doc['events'][i]['status'],
'description': doc['events'][i]['description'],
'user': doc['events'][i]['user'],
'timestamp': doc['events'][i]['timestamp']
}
df_event = df_event.append(new_row, ignore_index=True)
except Exception as e:
print('EXP - {}'.format(e))
我正在尝试从 mongodb 集合生成一个新的数据框,目标是制作一个仅代表 'events' 列的新 df:
例如:
{
"_id" : 1641008579,
"status" : "init",
"description" : "Test",
"attachment" : null,
"start" : "08:00",
"user" : "Jenny",
"timestamp" : ISODate("2022-01-01T04:43:11.380Z"),
"events" : [
{
"id" : 1641008580,
"status" : "start",
"description" : "First Event",
"user" : "Jenny",
"timestamp" : ISODate("2022-01-01T04:43:11.380Z")
},
{
"id" : 1641008581,
"status" : "progress",
"description" : "Middle of the Event",
"user" : "Joe",
"timestamp" : ISODate("2022-01-01T05:43:11.380Z")
},
{
"id" : 1641008582,
"status" : "end",
"description" : "Last Event",
"user" : "Alain",
"timestamp" : ISODate("2022-01-01T06:43:11.380Z")
}
]
}
知道如何开始一种方法才能获得以下结果吗?
event_df 应该像下面这样:
id status description user timestamp
0 1641008580 start First Event Jenny "2022-01-01T04:43:11.380Z"
1 1641008581 progress Middle of the Event Joe "2022-01-01T05:43:11.380Z"
2 1641008582 end Last Event Alain "2022-01-01T06:43:11.380Z"
/K
Pandas' pandas.json_normalize
方法在这里非常有效,它将“将半结构化 JSON 数据规范化为平面 table。”返回 DataFrame
.
API 参考 -> pandas.json_normalize
import json
import pandas as pd
with open('mongo.json') as json_file: # retrieve the json file
data = json.load(json_file) # deserialize the json file to a dict
events_df = pd.json_normalize(data['events']) # normalize and create a dataframe
print(events_df)
这是加载集合后的函数:
def set_event_2_df(last_situation):
for doc in last_situation:
for k, v in doc.items():
try:
if k == 'events':
for i, e in enumerate(doc['events']):
new_row = {
'id': str(doc['events'][i]['id']),
'status': doc['events'][i]['status'],
'description': doc['events'][i]['description'],
'user': doc['events'][i]['user'],
'timestamp': doc['events'][i]['timestamp']
}
df_event = df_event.append(new_row, ignore_index=True)
except Exception as e:
print('EXP - {}'.format(e))