如何在 Pandas 中取消嵌套日期列和相关列?
How to unnest a date column and a related column together in Pandas?
我有一个包含两列的数据框,我想一起展开/取消嵌套。一个包含日期,另一个包含与日期相关的信息。
这是初始 df 的样子:
data = [
["ABC", 2002, ["AB", "AB", "EF"], ["2002-05-06", "2002-05-07", "2002-05-12"]],
["DEF", 2002, [["CD", "EF"]], ["2002-06-12", "2002-06-13"]],
["GHI", 2002, [["JK"]], ["2002-03-02"]],
["JKL", 2002, [[]], ["2002-03-02"]],
]
df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df
我想要它喜欢的是,日期变量和相关列表元素一起解包:
data = [
["ABC", 2002, ["AB"], ["2002-05-06"]],
["ABC", 2002, ["AB"], ["2002-05-07"]],
["ABC", 2002, ["EF"], ["2002-05-12"]],
["DEF", 2002, ["CD"], ["2002-06-12"]],
["DEF", 2002, ["EF"], ["2002-06-13"]],
["GHI", 2002, [["JK"]], ["2002-03-02"]],
["JKL", 2002, [[]], ["2002-03-02"]],
]
df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df
我试过分别分解 list
和 date_list
列,但我不知道有什么方法可以将它们以有序的方式取消嵌套。有人知道怎么做吗?
如果我理解正确的话:
extracted = df['list'].explode().to_frame().reset_index(drop=True).join(df['date_list'].explode().reset_index())
df = df[['ID', 'year']].merge(extracted[['list', 'date_list', 'index']], left_index=True, right_on='index').drop(columns=['index'])
输出:
ID year list date_list
0 ABC 2002 AB 2002-05-06
1 ABC 2002 AB 2002-05-07
2 ABC 2002 EF 2002-05-12
3 DEF 2002 CD 2002-06-12
4 DEF 2002 EF 2002-06-13
5 GHI 2002 JK 2002-03-02
6 JKL 2002 NaN 2002-03-02
我有一个包含两列的数据框,我想一起展开/取消嵌套。一个包含日期,另一个包含与日期相关的信息。
这是初始 df 的样子:
data = [
["ABC", 2002, ["AB", "AB", "EF"], ["2002-05-06", "2002-05-07", "2002-05-12"]],
["DEF", 2002, [["CD", "EF"]], ["2002-06-12", "2002-06-13"]],
["GHI", 2002, [["JK"]], ["2002-03-02"]],
["JKL", 2002, [[]], ["2002-03-02"]],
]
df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df
我想要它喜欢的是,日期变量和相关列表元素一起解包:
data = [
["ABC", 2002, ["AB"], ["2002-05-06"]],
["ABC", 2002, ["AB"], ["2002-05-07"]],
["ABC", 2002, ["EF"], ["2002-05-12"]],
["DEF", 2002, ["CD"], ["2002-06-12"]],
["DEF", 2002, ["EF"], ["2002-06-13"]],
["GHI", 2002, [["JK"]], ["2002-03-02"]],
["JKL", 2002, [[]], ["2002-03-02"]],
]
df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df
我试过分别分解 list
和 date_list
列,但我不知道有什么方法可以将它们以有序的方式取消嵌套。有人知道怎么做吗?
如果我理解正确的话:
extracted = df['list'].explode().to_frame().reset_index(drop=True).join(df['date_list'].explode().reset_index())
df = df[['ID', 'year']].merge(extracted[['list', 'date_list', 'index']], left_index=True, right_on='index').drop(columns=['index'])
输出:
ID year list date_list
0 ABC 2002 AB 2002-05-06
1 ABC 2002 AB 2002-05-07
2 ABC 2002 EF 2002-05-12
3 DEF 2002 CD 2002-06-12
4 DEF 2002 EF 2002-06-13
5 GHI 2002 JK 2002-03-02
6 JKL 2002 NaN 2002-03-02