将格式化字符串转换为 pandas 数据框?

Converting a formatted string into a pandas dataframe?

我有一个不寻常但格式一致的长字符串,我想将其转换为 pandas 数据框。以下是重复格式的示例:

' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

所需的数据框是这样的:

Col A    Col B    Col C
"Val"    10       1
"Val"    4        0

我试过用方括号作为分隔符将字符串拆分,但由于数据类型不同,我无法将每个拆分转换成一行。

有更简单的方法吗?

我能想到的最简单的方法。

import json
import pandas as pd

txt = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

# simulate a list of dicts and parse it like a json file
data = json.loads(f'[{txt}]')

df = pd.DataFrame(data)

eval() 也适用于这种情况:

pd.DataFrame(eval(txt))

  Col A  Col B  Col C
0   Val     10      1
1   Val      4      0

您可以将字符串评估为字典或将其解释为 JSON。

要评估不要使用 eval(危险)但 ast.literal_eval:

s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

from ast import literal_eval
import pandas as pd

df = pd.DataFrame(literal_eval(s.strip()))

对于JSON,直接使用pandas.read_json

s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

import pandas as pd

df = pd.read_json(f'[{s}]')

输出:

  Col A  Col B  Col C
0   Val     10      1
1   Val      4      0