将每个单元格的 excel 列（如字典）转换为多个 pandas 列

Question

# test.csv
co11,col2
a,"{'Country':'USA', 'Gender':'Male'}"
b,"{'Country':'China', 'Gender':'Female'}"

df = pd.read_csv('test.csv')

我在 csv 文件中有一列，每个单元格都包含一个 python 像数据结构一样的字典。
我应该如何使用 Python 将 csv 中的此单元格转换为名为“国家/地区”和“性别”的两列？

Answer 1

`test.csv`

我在 csv 文件中有一列

co11,col2
a,"{'Country':'USA', 'Gender':'Male'}"
b,"{'Country':'China', 'Gender':'Female'}"

代码

除非使用 ast.literal_eval 转换，否则列中的字典将被读取为字符串。
- 在pandas.read_csv
使用 pd.json_normalize 转换 dicts，keys 为 headers，values 为行。

import pandas as pd
from ast import literal_eval

# read the csv and convert string to dict
df = pd.read_csv('test.csv', converters={'col2': literal_eval})

# display(df)
  co11                                      col2
0    a      {'Country': 'USA', 'Gender': 'Male'}
1    b  {'Country': 'China', 'Gender': 'Female'}

# unpack the dictionaries in col2 and join then as separate columns to df
df = df.join(pd.json_normalize(df.col2))

# drop col2
df.drop(columns=['col2'], inplace=True)

# df
  co11 Country  Gender
0    a     USA    Male
1    b   China  Female

Answer 2

正在读取 CSV 文件：
需要 ast.literal_eval 否则 pd.read_csv 会将字典读取为字符串

import ast

df = pd.read_csv('/data_with_dict.csv', converters={'dict_col': ast.literal_eval})

正在处理包含字典的数据帧：

# Example dataframe
df = pd.DataFrame({'unk_col' : ['foo','bar'], 
                   'dict_col': [{'Country':'USA',   'Gender':'Male'}, 
                                {'Country':'China', 'Gender':'Female'}]})

# Convert dictionary to columns
df = pd.concat([df.drop(columns=['dict_col']), df['dict_col'].apply(pd.Series)], axis=1)

# Write to file
df.to_csv(''/data_no_dict.csv'', index=False)

print(df)

输出：

  unk_col Country  Gender
0     foo     USA    Male
1     bar   China  Female

将每个单元格的 excel 列（如字典）转换为多个 pandas 列

Converting an excel columns with each cell like a dictionary to multiple pandas columns

python

csv

dictionary

json-normalize

`test.csv`

代码