如何从现有的 pandas 层次结构和任意形状的数据框创建父子数据框?
How to create a parent-child dataframe from an existing pandas dataframe of hierarchical structure and arbitrary shape?
如何将具有层次结构和任意形状(例如,类似于下面的数据框)的给定数据框转换为具有父列和子列的新数据框?
编辑:请注意,一个约束是子项不能是其自己的父项。
data = {'level1': ['A', 'A', 'B', 'B', 'C'],
'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
}
df = pd.DataFrame(data)
level1 level2 level3 level4
0 A James Stockholm red
1 A Robert Denver Denver
2 B Patricia Moscow yellow
3 B Patricia Moscow purple
4 C John Palermo blue
期望的输出是这样的:
parent child
0 A James
1 A Robert
2 B Patricia
3 C John
4 James Stockholm
5 Robert Denver
6 Patricia Moscow
7 John Palermo
8 Stockholm red
9 Moscow yellow
10 Moscow purple
11 Palermo blue
我能想到的是在数据框的列上使用 for 循环:
columns = df.columns
length = len(columns)
parent = []
children = []
for i in range(length):
if i != length - 1 :
parent += df[columns[i]].to_list()
children += df[columns[i+1]].to_list()
newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
newDf[newDf["parent"] != newDf["children"]]
输出
parent children
0 A James
1 A Robert
2 B Patricia
4 C John
5 James Stockholm
6 Robert Denver
7 Patricia Moscow
9 John Palermo
10 Stockholm red
12 Moscow yellow
13 Moscow purple
14 Palermo blue
如何将具有层次结构和任意形状(例如,类似于下面的数据框)的给定数据框转换为具有父列和子列的新数据框?
编辑:请注意,一个约束是子项不能是其自己的父项。
data = {'level1': ['A', 'A', 'B', 'B', 'C'],
'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
}
df = pd.DataFrame(data)
level1 level2 level3 level4
0 A James Stockholm red
1 A Robert Denver Denver
2 B Patricia Moscow yellow
3 B Patricia Moscow purple
4 C John Palermo blue
期望的输出是这样的:
parent child
0 A James
1 A Robert
2 B Patricia
3 C John
4 James Stockholm
5 Robert Denver
6 Patricia Moscow
7 John Palermo
8 Stockholm red
9 Moscow yellow
10 Moscow purple
11 Palermo blue
我能想到的是在数据框的列上使用 for 循环:
columns = df.columns
length = len(columns)
parent = []
children = []
for i in range(length):
if i != length - 1 :
parent += df[columns[i]].to_list()
children += df[columns[i+1]].to_list()
newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
newDf[newDf["parent"] != newDf["children"]]
输出
parent children
0 A James
1 A Robert
2 B Patricia
4 C John
5 James Stockholm
6 Robert Denver
7 Patricia Moscow
9 John Palermo
10 Stockholm red
12 Moscow yellow
13 Moscow purple
14 Palermo blue