无法将 key:value 从列中的字典拉到多个列
Unable to pull the key:value from dictionary in column to multiple columns
所以我在这个 post (Split / Explode a column of dictionaries into separate columns with pandas) 中使用了解决方案,但我的 df 没有任何变化。
这里是代码前的df:
number status_timestamps
0 234234 {"created": "2020-11-30T19:44:42Z", "complete"...
1 2342 {"created": "2020-12-14T13:43:48Z", "complete"...
这是该列中的字典示例:
{"created": "2020-11-30T19:44:42Z",
"complete": "2021-01-17T14:20:58Z",
"invoiced": "2020-12-16T22:55:02Z",
"confirmed": "2020-11-30T21:16:48Z",
"in_production": "2020-12-11T18:59:26Z",
"invoice_needed": "2020-12-11T22:00:09Z",
"accepted": "2020-12-01T00:00:23Z",
"assets_uploaded": "2020-12-11T17:16:53Z",
"notified": "2020-11-30T21:17:48Z",
"processing": "2020-12-11T18:49:50Z",
"classified": "2020-12-11T18:49:50Z"}
这是我试过的,df 没有改变:
df_final = pd.concat([df, df['status_timestamps'].progress_apply(pd.Series)], axis = 1).drop('status_timestamps', axis = 1)
这是笔记本中发生的事情:
请提供您下次尝试过的最小可重现工作示例。
如果我按照提到的 post 中的解决方案进行操作,它会起作用。
这是我用过的代码:
import pandas as pd
json_data = {"created": "2020-11-30T19:44:42Z",
"complete": "2021-01-17T14:20:58Z",
"invoiced": "2020-12-16T22:55:02Z",
"confirmed": "2020-11-30T21:16:48Z",
"in_production": "2020-12-11T18:59:26Z",
"invoice_needed": "2020-12-11T22:00:09Z",
"accepted": "2020-12-01T00:00:23Z",
"assets_uploaded": "2020-12-11T17:16:53Z",
"notified": "2020-11-30T21:17:48Z",
"processing": "2020-12-11T18:49:50Z",
"classified": "2020-12-11T18:49:50Z"}
df = pd.DataFrame({"number": 2342, "status_timestamps": [json_data]})
# fastest solution proposed by your reference post
df.join(pd.DataFrame(df.pop('status_timestamps').values.tolist()))
我能够使用那个 post 的另一个答案,但更改为更安全的 literal_eval
选项,因为它使用的是 eval
这是工作代码:
import pandas as pd
from ast import literal_eval
df = pd.read_csv('c:/status_timestamps.csv')
df["status_timestamps"] = df["status_timestamps"].apply(lambda x : dict(literal_eval(x)) )
df2 = df["status_timestamps"].apply(pd.Series )
df_final = pd.concat([df, df2], axis=1).drop('status_timestamps', axis=1)
df_final
所以我在这个 post (Split / Explode a column of dictionaries into separate columns with pandas) 中使用了解决方案,但我的 df 没有任何变化。
这里是代码前的df:
number status_timestamps
0 234234 {"created": "2020-11-30T19:44:42Z", "complete"...
1 2342 {"created": "2020-12-14T13:43:48Z", "complete"...
这是该列中的字典示例:
{"created": "2020-11-30T19:44:42Z",
"complete": "2021-01-17T14:20:58Z",
"invoiced": "2020-12-16T22:55:02Z",
"confirmed": "2020-11-30T21:16:48Z",
"in_production": "2020-12-11T18:59:26Z",
"invoice_needed": "2020-12-11T22:00:09Z",
"accepted": "2020-12-01T00:00:23Z",
"assets_uploaded": "2020-12-11T17:16:53Z",
"notified": "2020-11-30T21:17:48Z",
"processing": "2020-12-11T18:49:50Z",
"classified": "2020-12-11T18:49:50Z"}
这是我试过的,df 没有改变:
df_final = pd.concat([df, df['status_timestamps'].progress_apply(pd.Series)], axis = 1).drop('status_timestamps', axis = 1)
这是笔记本中发生的事情:
请提供您下次尝试过的最小可重现工作示例。
如果我按照提到的 post 中的解决方案进行操作,它会起作用。
这是我用过的代码:
import pandas as pd
json_data = {"created": "2020-11-30T19:44:42Z",
"complete": "2021-01-17T14:20:58Z",
"invoiced": "2020-12-16T22:55:02Z",
"confirmed": "2020-11-30T21:16:48Z",
"in_production": "2020-12-11T18:59:26Z",
"invoice_needed": "2020-12-11T22:00:09Z",
"accepted": "2020-12-01T00:00:23Z",
"assets_uploaded": "2020-12-11T17:16:53Z",
"notified": "2020-11-30T21:17:48Z",
"processing": "2020-12-11T18:49:50Z",
"classified": "2020-12-11T18:49:50Z"}
df = pd.DataFrame({"number": 2342, "status_timestamps": [json_data]})
# fastest solution proposed by your reference post
df.join(pd.DataFrame(df.pop('status_timestamps').values.tolist()))
我能够使用那个 post 的另一个答案,但更改为更安全的 literal_eval
选项,因为它使用的是 eval
这是工作代码:
import pandas as pd
from ast import literal_eval
df = pd.read_csv('c:/status_timestamps.csv')
df["status_timestamps"] = df["status_timestamps"].apply(lambda x : dict(literal_eval(x)) )
df2 = df["status_timestamps"].apply(pd.Series )
df_final = pd.concat([df, df2], axis=1).drop('status_timestamps', axis=1)
df_final