将格式化字符串转换为 pandas 数据框?
Converting a formatted string into a pandas dataframe?
我有一个不寻常但格式一致的长字符串,我想将其转换为 pandas 数据框。以下是重复格式的示例:
' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
所需的数据框是这样的:
Col A Col B Col C
"Val" 10 1
"Val" 4 0
我试过用方括号作为分隔符将字符串拆分,但由于数据类型不同,我无法将每个拆分转换成一行。
有更简单的方法吗?
我能想到的最简单的方法。
import json
import pandas as pd
txt = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
# simulate a list of dicts and parse it like a json file
data = json.loads(f'[{txt}]')
df = pd.DataFrame(data)
eval()
也适用于这种情况:
pd.DataFrame(eval(txt))
Col A Col B Col C
0 Val 10 1
1 Val 4 0
您可以将字符串评估为字典或将其解释为 JSON。
要评估不要使用 eval
(危险)但 ast.literal_eval
:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
from ast import literal_eval
import pandas as pd
df = pd.DataFrame(literal_eval(s.strip()))
对于JSON,直接使用pandas.read_json
:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
import pandas as pd
df = pd.read_json(f'[{s}]')
输出:
Col A Col B Col C
0 Val 10 1
1 Val 4 0
我有一个不寻常但格式一致的长字符串,我想将其转换为 pandas 数据框。以下是重复格式的示例:
' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
所需的数据框是这样的:
Col A Col B Col C
"Val" 10 1
"Val" 4 0
我试过用方括号作为分隔符将字符串拆分,但由于数据类型不同,我无法将每个拆分转换成一行。
有更简单的方法吗?
我能想到的最简单的方法。
import json
import pandas as pd
txt = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
# simulate a list of dicts and parse it like a json file
data = json.loads(f'[{txt}]')
df = pd.DataFrame(data)
eval()
也适用于这种情况:
pd.DataFrame(eval(txt))
Col A Col B Col C
0 Val 10 1
1 Val 4 0
您可以将字符串评估为字典或将其解释为 JSON。
要评估不要使用 eval
(危险)但 ast.literal_eval
:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
from ast import literal_eval
import pandas as pd
df = pd.DataFrame(literal_eval(s.strip()))
对于JSON,直接使用pandas.read_json
:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
import pandas as pd
df = pd.read_json(f'[{s}]')
输出:
Col A Col B Col C
0 Val 10 1
1 Val 4 0