Pandas Dataframe 将带有 dict 值的列拆分为列
Pandas Dataframe splitting a column with dict values into columns
我正在尝试将 pandas 数据框中的列与字典值列表拆分并转换为新列。使用 Splitting dictionary/list inside a Pandas Column into Separate Columns 作为参考似乎失败了,因为某些值是 NaN。当遇到这些行时会抛出错误,无法迭代 float,如果我 fillna
和 None 错误更改为 str
相关错误。
我曾尝试首先使用:
df.explode('freshness_grades')
df_new = pd.concat([df_new.drop('freshness_grades', axis=1), pd.DataFrame(df_new['freshness_grades'].tolist())], axis=1)
我这样做是为了实质上将字典列表更改为字典。
_id freshness_grades
0 57ea8d0d9c624c035f96f45e [{'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-07-23T18:43:00+00:00', 'grade': 'A', 'start_date': '2015-03-05T01:54:47+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-08-22T18:43:00+00:00', 'grade': 'B', 'start_date': '2015-07-23T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-10-21T18:43:00+00:00', 'grade': 'C', 'start_date': '2015-08-22T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-10-21T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2016-07-22T18:43:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'grade': 'F', 'start_date': '2016-07-22T18:43:00+00:00'}]
1 57ea8d0e9c624c035f96f460 [{'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-07-20T14:04:00+00:00', 'grade': 'A', 'start_date': '2015-07-14T08:48:49+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-08-19T14:04:00+00:00', 'grade': 'B', 'start_date': '2015-07-20T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-10-18T14:04:00+00:00', 'grade': 'C', 'start_date': '2015-08-19T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-10-18T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2016-07-19T14:04:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'grade': 'F', 'start_date': '2016-07-19T14:04:00+00:00'}]
2 57ea8d0e9c624c035f96f462 [{'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-09-29T09:46:00+00:00', 'grade': 'A', 'start_date': '2015-07-27T15:21:32+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-10-29T09:46:00+00:00', 'grade': 'B', 'start_date': '2015-09-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-10-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2016-09-28T09:46:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'grade': 'F', 'start_date': '2016-09-28T09:46:00+00:00'}]
3 57ea8d0f9c624c035f96f466 [{'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-09-29T09:46:00+00:00', 'grade': 'A', 'start_date': '2015-09-09T13:20:14+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-10-29T09:46:00+00:00', 'grade': 'B', 'start_date': '2015-09-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-10-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2016-09-28T09:46:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'grade': 'F', 'start_date': '2016-09-28T09:46:00+00:00'}]
4 57ea8d109c624c035f96f468 [{'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-11-04T12:12:00+00:00', 'grade': 'A', 'start_date': '2015-10-30T07:43:46+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-11-11T12:12:00+00:00', 'grade': 'B', 'start_date': '2015-11-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-11-11T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2016-11-03T12:12:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'grade': 'F', 'start_date': '2016-11-03T12:12:00+00:00'}]
5 5f1eb63dbed8bd4f99e2a280 NaN
以 ehf 第一行为例,我希望实现:
_id creation_date end_date grade start_date
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-07-23T18:43:00+00:0 A 2015-03-05T01:54:47+00:00
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-08-22T18:43:00+00:00 B 2015-07-23T18:43:00+00:00
...
我从 explode
开始,这一步非常有效。
不过,我还没有尝试 reset_index()
。失败的是 pd.concat()
,我认为它与 NaN
相关,或者列表中实际上有多个词典。例如,在 explode()
之后,即 {}, {}, {}
json_normalize
不适用于 NaN
的列
- 用
{}
填充NaN
。
- 另见
# explode the list
df = df.explode('freshness_grades').reset_index(drop=True)
# now fill the NaN with an empty dict
df.freshness_grades = df.freshness_grades.fillna({i: {} for i in df.index})
# then normalize the column
df = df.join(pd.json_normalize(df.freshness_grades))
# drop the column
df.drop(columns=['freshness_grades'], inplace=True)
输出
_id creation_date end_date grade start_date
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-07-23T18:43:00+00:00 A 2015-03-05T01:54:47+00:00
1 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-08-22T18:43:00+00:00 B 2015-07-23T18:43:00+00:00
2 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-10-21T18:43:00+00:00 C 2015-08-22T18:43:00+00:00
3 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2016-02-02T12:12:00+00:00 D 2015-10-21T18:43:00+00:00
4 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2016-07-22T18:43:00+00:00 E 2016-02-02T12:12:00+00:00
我正在尝试将 pandas 数据框中的列与字典值列表拆分并转换为新列。使用 Splitting dictionary/list inside a Pandas Column into Separate Columns 作为参考似乎失败了,因为某些值是 NaN。当遇到这些行时会抛出错误,无法迭代 float,如果我 fillna
和 None 错误更改为 str
相关错误。
我曾尝试首先使用:
df.explode('freshness_grades')
df_new = pd.concat([df_new.drop('freshness_grades', axis=1), pd.DataFrame(df_new['freshness_grades'].tolist())], axis=1)
我这样做是为了实质上将字典列表更改为字典。
_id freshness_grades
0 57ea8d0d9c624c035f96f45e [{'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-07-23T18:43:00+00:00', 'grade': 'A', 'start_date': '2015-03-05T01:54:47+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-08-22T18:43:00+00:00', 'grade': 'B', 'start_date': '2015-07-23T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2015-10-21T18:43:00+00:00', 'grade': 'C', 'start_date': '2015-08-22T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-10-21T18:43:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'end_date': '2016-07-22T18:43:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:02.865000+00:00', 'grade': 'F', 'start_date': '2016-07-22T18:43:00+00:00'}]
1 57ea8d0e9c624c035f96f460 [{'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-07-20T14:04:00+00:00', 'grade': 'A', 'start_date': '2015-07-14T08:48:49+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-08-19T14:04:00+00:00', 'grade': 'B', 'start_date': '2015-07-20T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2015-10-18T14:04:00+00:00', 'grade': 'C', 'start_date': '2015-08-19T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-10-18T14:04:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'end_date': '2016-07-19T14:04:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-06-25T10:54:40.387000+00:00', 'grade': 'F', 'start_date': '2016-07-19T14:04:00+00:00'}]
2 57ea8d0e9c624c035f96f462 [{'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-09-29T09:46:00+00:00', 'grade': 'A', 'start_date': '2015-07-27T15:21:32+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-10-29T09:46:00+00:00', 'grade': 'B', 'start_date': '2015-09-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-10-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'end_date': '2016-09-28T09:46:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:03.600000+00:00', 'grade': 'F', 'start_date': '2016-09-28T09:46:00+00:00'}]
3 57ea8d0f9c624c035f96f466 [{'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-09-29T09:46:00+00:00', 'grade': 'A', 'start_date': '2015-09-09T13:20:14+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-10-29T09:46:00+00:00', 'grade': 'B', 'start_date': '2015-09-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-10-29T09:46:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'end_date': '2016-09-28T09:46:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.305000+00:00', 'grade': 'F', 'start_date': '2016-09-28T09:46:00+00:00'}]
4 57ea8d109c624c035f96f468 [{'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-11-04T12:12:00+00:00', 'grade': 'A', 'start_date': '2015-10-30T07:43:46+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-11-11T12:12:00+00:00', 'grade': 'B', 'start_date': '2015-11-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2015-12-04T12:12:00+00:00', 'grade': 'C', 'start_date': '2015-11-11T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2016-02-02T12:12:00+00:00', 'grade': 'D', 'start_date': '2015-12-04T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'end_date': '2016-11-03T12:12:00+00:00', 'grade': 'E', 'start_date': '2016-02-02T12:12:00+00:00'}, {'creation_date': '2019-04-20T06:02:04.673000+00:00', 'grade': 'F', 'start_date': '2016-11-03T12:12:00+00:00'}]
5 5f1eb63dbed8bd4f99e2a280 NaN
以 ehf 第一行为例,我希望实现:
_id creation_date end_date grade start_date
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-07-23T18:43:00+00:0 A 2015-03-05T01:54:47+00:00
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-08-22T18:43:00+00:00 B 2015-07-23T18:43:00+00:00
...
我从 explode
开始,这一步非常有效。
不过,我还没有尝试 reset_index()
。失败的是 pd.concat()
,我认为它与 NaN
相关,或者列表中实际上有多个词典。例如,在 explode()
之后,即 {}, {}, {}
json_normalize
不适用于NaN
的列- 用
{}
填充NaN
。
- 用
- 另见
# explode the list
df = df.explode('freshness_grades').reset_index(drop=True)
# now fill the NaN with an empty dict
df.freshness_grades = df.freshness_grades.fillna({i: {} for i in df.index})
# then normalize the column
df = df.join(pd.json_normalize(df.freshness_grades))
# drop the column
df.drop(columns=['freshness_grades'], inplace=True)
输出
_id creation_date end_date grade start_date
0 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-07-23T18:43:00+00:00 A 2015-03-05T01:54:47+00:00
1 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-08-22T18:43:00+00:00 B 2015-07-23T18:43:00+00:00
2 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2015-10-21T18:43:00+00:00 C 2015-08-22T18:43:00+00:00
3 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2016-02-02T12:12:00+00:00 D 2015-10-21T18:43:00+00:00
4 57ea8d0d9c624c035f96f45e 2019-04-20T06:02:02.865000+00:00 2016-07-22T18:43:00+00:00 E 2016-02-02T12:12:00+00:00