Python:将包含字符串的列转换为包含 json 字典的列
Python: convert column containing string to column containing json dictionary
我有一个数据框,其中的列如下所示:
df=pd.DataFrame()
df['symbol'] = ['A','B','C']
df['json_list'] = ['[{name:S&P500, perc:25, ticker:SPY, weight:1}]',
'[{name:S&P500, perc:25, ticker:SPY, weight:0.5}, {name:NASDAQ, perc:26, ticker:NASDAQ, weight:0.5}]',
'[{name:S&P500, perc:25, ticker:SPY, weight:1}]']
df['date'] = ['2022-01-01', '2022-01-02', '2022-01-02']
df:
symbol json_list date
0 A [{name:S&P500, perc:25, ticker:SPY, weight:1}] 2022-01-01
1 B [{name:S&P500, perc:25, ticker:SPY, weight:0.5... 2022-01-02
2 C [{name:S&P500, perc:25, ticker:SPY, weight:1}] 2022-01-02
json_list
列中的值是 <class 'str'>
。
如何将 json_list
列项目转换为字典,以便我可以基于 key:value 对访问它们?
提前致谢。
UPDATED 以反映问题中的 json 字符串不是单例列表,但可以包含多个 dict-like 元素。
这会将 list
个 dict
对象放入数据框的新列中:
def foo(x):
src = x['json_list']
rawList = src[1:-1].split('{')[1:]
rawDictList = [x.split('}')[0] for x in rawList]
dictList = [dict(x.strip().split(':') for x in y.split(',')) for y in rawDictList]
for dct in dictList:
for k in dct:
try:
dct[k] = int(dct[k])
except ValueError:
try:
dct[k] = float(dct[k])
except ValueError:
pass
return dictList
df['list_of_dict_object'] = df.apply(foo, axis = 1)
原回答:
这会将 dict
放入数据框的新列中,除了数字输入外,它应该会为您提供接近您想要的内容:
df['dict_object'] = df.apply(lambda x: dict(x.strip().split(':') for x in x['json_list'][2:-2].split(',')), axis = 1)
要获取字符串值可转换的浮点数或整数值,您可以这样做:
def foo(x):
d = dict(x.strip().split(':') for x in x['json_list'][2:-2].split(','))
for k in d:
try:
d[k] = int(d[k])
except ValueError:
try:
d[k] = float(d[k])
except ValueError:
pass
return d
df['dict_object'] = df.apply(foo, axis = 1)
“json”几乎是有效的 yaml。如果在冒号后面加一个space,就可以用pyyaml解析了
df.json_list.apply(lambda data: yaml.safe_load(data.replace(':', ': ')))
我有一个数据框,其中的列如下所示:
df=pd.DataFrame()
df['symbol'] = ['A','B','C']
df['json_list'] = ['[{name:S&P500, perc:25, ticker:SPY, weight:1}]',
'[{name:S&P500, perc:25, ticker:SPY, weight:0.5}, {name:NASDAQ, perc:26, ticker:NASDAQ, weight:0.5}]',
'[{name:S&P500, perc:25, ticker:SPY, weight:1}]']
df['date'] = ['2022-01-01', '2022-01-02', '2022-01-02']
df:
symbol json_list date
0 A [{name:S&P500, perc:25, ticker:SPY, weight:1}] 2022-01-01
1 B [{name:S&P500, perc:25, ticker:SPY, weight:0.5... 2022-01-02
2 C [{name:S&P500, perc:25, ticker:SPY, weight:1}] 2022-01-02
json_list
列中的值是 <class 'str'>
。
如何将 json_list
列项目转换为字典,以便我可以基于 key:value 对访问它们?
提前致谢。
UPDATED 以反映问题中的 json 字符串不是单例列表,但可以包含多个 dict-like 元素。
这会将 list
个 dict
对象放入数据框的新列中:
def foo(x):
src = x['json_list']
rawList = src[1:-1].split('{')[1:]
rawDictList = [x.split('}')[0] for x in rawList]
dictList = [dict(x.strip().split(':') for x in y.split(',')) for y in rawDictList]
for dct in dictList:
for k in dct:
try:
dct[k] = int(dct[k])
except ValueError:
try:
dct[k] = float(dct[k])
except ValueError:
pass
return dictList
df['list_of_dict_object'] = df.apply(foo, axis = 1)
原回答:
这会将 dict
放入数据框的新列中,除了数字输入外,它应该会为您提供接近您想要的内容:
df['dict_object'] = df.apply(lambda x: dict(x.strip().split(':') for x in x['json_list'][2:-2].split(',')), axis = 1)
要获取字符串值可转换的浮点数或整数值,您可以这样做:
def foo(x):
d = dict(x.strip().split(':') for x in x['json_list'][2:-2].split(','))
for k in d:
try:
d[k] = int(d[k])
except ValueError:
try:
d[k] = float(d[k])
except ValueError:
pass
return d
df['dict_object'] = df.apply(foo, axis = 1)
“json”几乎是有效的 yaml。如果在冒号后面加一个space,就可以用pyyaml解析了
df.json_list.apply(lambda data: yaml.safe_load(data.replace(':', ': ')))