从 OrderedDict 中的特定值创建数据框
Create a dataframe from specific values in OrderedDict
我有一个这样的 OrderedDict:
OrderedDict([('searchedFolder', {'id': '1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', 'name': 'Test',
'mimeType': 'application/vnd.google-apps.folder'}), ('folderTree', OrderedDict([('id',
[['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']]), ('names', ['Test', 'Test1', 'Test2']), ('folders',
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3'])])), ('fileList', [{'files': [{'id':
'1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x', 'name': 'test1.xlsx'}], 'folderTree':
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK']}, {'files': [{'id': '1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY',
'name': 'test2.xlsx'}], 'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P']}, {'files': [{'id': '1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq',
'name': 'test3.xlsx'}, {'id': '10ReTrPWGr_inWjj_eahFtBmIYtjthw2s', 'name': 'test4.xlsx'}],
'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']}]),
('totalNumberOfFolders', 3), ('totalNumberOfFiles', 4)])
我想创建一个包含文件名和 ID 的数据框,如下所示:
id name
0 1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x test1.xlsx
1 1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY test2.xlsx
2 1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq test3.xlsx
3 10ReTrPWGr_inWjj_eahFtBmIYtjthw2s test4.xlsx
文件名只是出于测试目的而随机命名,我还有其他文件,不仅仅是 excel(.png、.jpg、.doc 等)
首先,我尝试创建一个数据框,然后使用以下方法提取这些值:
df=pd.DataFrame(Ordereddict) or df=pd.DataFrame.from_dict(Ordereddict)
但我收到了这个错误:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
单向:
df = pd.json_normalize(ord_dict['fileList'], record_path=['files'])
或:
df = pd.DataFrame(ord_dict['fileList'])['files'].explode().apply(pd.Series)
输出:
id name
0 1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x test1.xlsx
1 1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY test2.xlsx
2 1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq test3.xlsx
2 10ReTrPWGr_inWjj_eahFtBmIYtjthw2s test4.xlsx
完整代码:
from collections import OrderedDict
ord_dict = OrderedDict([('searchedFolder', {'id': '1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', 'name': 'Test',
'mimeType': 'application/vnd.google-apps.folder'}), ('folderTree', OrderedDict([('id',
[['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']]), ('names', ['Test', 'Test1', 'Test2']), ('folders',
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3'])])), ('fileList', [{'files': [{'id':
'1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x', 'name': 'test1.xlsx'}], 'folderTree':
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK']}, {'files': [{'id': '1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY',
'name': 'test2.xlsx'}], 'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P']}, {'files': [{'id': '1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq',
'name': 'test3.xlsx'}, {'id': '10ReTrPWGr_inWjj_eahFtBmIYtjthw2s', 'name': 'test4.xlsx'}],
'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']}]),
('totalNumberOfFolders', 3), ('totalNumberOfFiles', 4)])
df = pd.json_normalize(ord_dict['fileList'], record_path=['files'])
我有一个这样的 OrderedDict:
OrderedDict([('searchedFolder', {'id': '1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', 'name': 'Test',
'mimeType': 'application/vnd.google-apps.folder'}), ('folderTree', OrderedDict([('id',
[['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']]), ('names', ['Test', 'Test1', 'Test2']), ('folders',
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3'])])), ('fileList', [{'files': [{'id':
'1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x', 'name': 'test1.xlsx'}], 'folderTree':
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK']}, {'files': [{'id': '1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY',
'name': 'test2.xlsx'}], 'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P']}, {'files': [{'id': '1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq',
'name': 'test3.xlsx'}, {'id': '10ReTrPWGr_inWjj_eahFtBmIYtjthw2s', 'name': 'test4.xlsx'}],
'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']}]),
('totalNumberOfFolders', 3), ('totalNumberOfFiles', 4)])
我想创建一个包含文件名和 ID 的数据框,如下所示:
id name
0 1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x test1.xlsx
1 1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY test2.xlsx
2 1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq test3.xlsx
3 10ReTrPWGr_inWjj_eahFtBmIYtjthw2s test4.xlsx
文件名只是出于测试目的而随机命名,我还有其他文件,不仅仅是 excel(.png、.jpg、.doc 等)
首先,我尝试创建一个数据框,然后使用以下方法提取这些值:
df=pd.DataFrame(Ordereddict) or df=pd.DataFrame.from_dict(Ordereddict)
但我收到了这个错误:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
单向:
df = pd.json_normalize(ord_dict['fileList'], record_path=['files'])
或:
df = pd.DataFrame(ord_dict['fileList'])['files'].explode().apply(pd.Series)
输出:
id name
0 1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x test1.xlsx
1 1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY test2.xlsx
2 1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq test3.xlsx
2 10ReTrPWGr_inWjj_eahFtBmIYtjthw2s test4.xlsx
完整代码:
from collections import OrderedDict
ord_dict = OrderedDict([('searchedFolder', {'id': '1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', 'name': 'Test',
'mimeType': 'application/vnd.google-apps.folder'}), ('folderTree', OrderedDict([('id',
[['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P'], ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']]), ('names', ['Test', 'Test1', 'Test2']), ('folders',
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P',
'1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3'])])), ('fileList', [{'files': [{'id':
'1I0vsHBo8GyWb1Jr30hQflTTZ3eIXpm8x', 'name': 'test1.xlsx'}], 'folderTree':
['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK']}, {'files': [{'id': '1TEBzg_EH9iG9A3i6oN18ZSElUE1EhwxY',
'name': 'test2.xlsx'}], 'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK',
'1bfMsEMU7zyILW6sLsTkZhjLLrogcWK8P']}, {'files': [{'id': '1jJwFxbKRYRYn4vRzNf62LYL27EfAHSvq',
'name': 'test3.xlsx'}, {'id': '10ReTrPWGr_inWjj_eahFtBmIYtjthw2s', 'name': 'test4.xlsx'}],
'folderTree': ['1uTjm6QEx7No09bgTX984lxmwMSfv2sYK', '1jyIXgH7hCOcqdb0ouNsR9EYWsRrjgPC3']}]),
('totalNumberOfFolders', 3), ('totalNumberOfFiles', 4)])
df = pd.json_normalize(ord_dict['fileList'], record_path=['files'])