如何使用 Pandas 分组函数自动执行此操作
How can I automate this using Pandas grouping function
我有两个数据框如下;
df1 = pd.DataFrame({'Group': ['A', 'A','B','B','C','C'],
'Col1': ['a1','a2','','','',''],
'Col2': ['', '', 'b1','b2','',''],
'Col3': ['', '', '','','c1','c2'],
'Col4': ['a11','a12','','','',''],
'Col5': ['', '', 'b11','b12','',''],
'Col6': ['', '', '','','c11','c12']
}
)
df2 = pd.DataFrame({'Group': ['A', 'A','B','B','C','C'],
'Field': ['Col1','Col2','Col3','Col4','Col5','Col6']
}
)
df2
我正在尝试合并这两个数据帧以获得如下所示的输出;有帮助吗?
输出格式2:
使用DataFrame.melt
with remove rows with empty strings by DataFrame.query
, last sorting by Group
column by DataFrame.sort_values
:
df = (df1.melt('Group',
value_name='Values',
var_name='Field')
.query('Values != ""')
.sort_values('Group', ignore_index=True))
print (df)
Group Field Values
0 A Col1 a1
1 A Col1 a2
2 A Col4 a11
3 A Col4 a12
4 B Col2 b1
5 B Col2 b2
6 B Col5 b11
7 B Col5 b12
8 C Col3 c1
9 C Col3 c2
10 C Col6 c11
11 C Col6 c12
如果空字符串缺少值 NaN
s 使用 DataFrame.dropna
:
df = (df1.melt('Group',
value_name='Values',
var_name='Field')
.dropna(subset=['Values'])
.sort_values('Group', ignore_index=True))
合并两个 DataFrame 的最后一个:
df = df.merge(df2, on=['Group','Field'])
print (df)
Group Field Values
0 A Col1 a1
1 A Col1 a2
2 C Col6 c11
3 C Col6 c12
编辑:因为 Group
中的值重复使用 GroupBy.cumcount
with DataFrame.set_index
for MultiIndex
, then replace empty strings to missing values, so possible create expected ouput by DataFrame.stack
and Series.unstack
:
df = (df1.set_index([df1.groupby('Group').cumcount(), 'Group'])
.replace('',np.nan)
.stack()
.unstack(0)
.rename(columns = lambda x: f'Value{x+1}')
.rename_axis(['Group','Field'])
.reset_index())
print (df)
Group Field Value1 Value2
0 A Col1 a1 a2
1 A Col4 a11 a12
2 B Col2 b1 b2
3 B Col5 b11 b12
4 C Col3 c1 c2
5 C Col6 c11 c12
我有两个数据框如下;
df1 = pd.DataFrame({'Group': ['A', 'A','B','B','C','C'],
'Col1': ['a1','a2','','','',''],
'Col2': ['', '', 'b1','b2','',''],
'Col3': ['', '', '','','c1','c2'],
'Col4': ['a11','a12','','','',''],
'Col5': ['', '', 'b11','b12','',''],
'Col6': ['', '', '','','c11','c12']
}
)
df2 = pd.DataFrame({'Group': ['A', 'A','B','B','C','C'],
'Field': ['Col1','Col2','Col3','Col4','Col5','Col6']
}
)
df2
我正在尝试合并这两个数据帧以获得如下所示的输出;有帮助吗?
输出格式2:
使用DataFrame.melt
with remove rows with empty strings by DataFrame.query
, last sorting by Group
column by DataFrame.sort_values
:
df = (df1.melt('Group',
value_name='Values',
var_name='Field')
.query('Values != ""')
.sort_values('Group', ignore_index=True))
print (df)
Group Field Values
0 A Col1 a1
1 A Col1 a2
2 A Col4 a11
3 A Col4 a12
4 B Col2 b1
5 B Col2 b2
6 B Col5 b11
7 B Col5 b12
8 C Col3 c1
9 C Col3 c2
10 C Col6 c11
11 C Col6 c12
如果空字符串缺少值 NaN
s 使用 DataFrame.dropna
:
df = (df1.melt('Group',
value_name='Values',
var_name='Field')
.dropna(subset=['Values'])
.sort_values('Group', ignore_index=True))
合并两个 DataFrame 的最后一个:
df = df.merge(df2, on=['Group','Field'])
print (df)
Group Field Values
0 A Col1 a1
1 A Col1 a2
2 C Col6 c11
3 C Col6 c12
编辑:因为 Group
中的值重复使用 GroupBy.cumcount
with DataFrame.set_index
for MultiIndex
, then replace empty strings to missing values, so possible create expected ouput by DataFrame.stack
and Series.unstack
:
df = (df1.set_index([df1.groupby('Group').cumcount(), 'Group'])
.replace('',np.nan)
.stack()
.unstack(0)
.rename(columns = lambda x: f'Value{x+1}')
.rename_axis(['Group','Field'])
.reset_index())
print (df)
Group Field Value1 Value2
0 A Col1 a1 a2
1 A Col4 a11 a12
2 B Col2 b1 b2
3 B Col5 b11 b12
4 C Col3 c1 c2
5 C Col6 c11 c12